February 09, 2023

AF - Conditioning Predictive Models: Deployment strategy by Evan Hubinger

14 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conditioning Predictive Models: Deployment strategy, published by Evan Hubinger on February 9, 2023 on The AI Alignment Forum.

This is the sixth of seven posts in the Conditioning Predictive Models Sequence based on the paper “Conditioning Predictive Models: Risks and Strategies” by Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson, and Kate Woolverton. Each post in the sequence corresponds to a different section of the paper. We will be releasing posts gradually over the course of the next week or so to give people time to read and digest them as they come out.

6. Deployment strategy

Previously, we have been focusing on how to make conditioning predictive models as safe and competitive as possible. Now, we want to take a step back and discuss considerations for using conditioning predictive models to address AI existential risk and what sorts of difficulties we might run into doing so in practice.

In particular, just as AI safety researchers naturally think of using predictive models for advancing AI safety research, AI capabilities researchers might naturally jump to using predictive models for advancing capabilities. It may not even be necessary to generate additional research to build AGI with a powerful predictive model. Simply ignoring the previously-discussed ELK-related difficulties and training a model to take actions that lead to predicted futures that a predicted human approves of may be sufficient. Either way, the existence of powerful predictive models seems likely to rapidly contract AI timelines.

As a result, by the time predictive models can be used to predict a full solution to AI safety, the time available to do so is minimal—and as such, it is important to have fleshed out plans on how to use them safely well ahead of time.

Dealing with other, less careful actors

As we mentioned previously, using a predictive model to generate alignment research is only one possible use case—one that we restricted our attention to on the basis that we thought it contained the difficult aspects of using a predictive model safely. Restricting our attention to these sorts of particular conditionals—and figuring out how to do them safely—is fine if we have control over the ways in which our model will be used. If we don’t have that control, however—e.g. we are in a world where people are using predictive models in all sorts of different ways—then we have to consider what might happen when our predictive model is used in a much less careful way than described here and figure out how to either deal with or prevent that from happening.

We think that getting other actors to use predictive models at all should be quite doable, for standard homogeneity reasons: why would a non-leading actor want to invest a ton of resources training a model in a different way than the way that the leading actor has already demonstrated successfully produces transformative AI? The problem, however, is that this same argument does not apply to what particular conditionals the non-leading actors might try, since trying a particular conditional is likely to be substantially cheaper than training an entire predictive model.

In a multipolar world, one team using very careful conditioning to get a predictive model to generate good alignment research means that other teams will likely soon have equivalently good models and might use them less carefully—e.g. resulting in them accidentally predicting malign superintelligences. Even in a unipolar world, a member of the team that created the predictive model might try to predict their future great-grandchildren out of curiosity, or check the predicted stock prices when they plan to retire, and inadvertently become exposed to manipulative outputs.

Since powerful predictive models can easily be used in less careful ways, any deployment strategy ...

...more