The Nonlinear Library

LW - On 'Responsible Scaling Policies' (RSPs) by Zvi


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On 'Responsible Scaling Policies' (RSPs), published by Zvi on December 6, 2023 on LessWrong.
This post was originally intended to come out directly after the UK AI Safety Summit, to give the topic its own deserved focus. One thing led to another, and I am only doubling back to it now.
Responsible Deployment Policies
At the AI Safety Summit, all the major Western players were asked: What are your company policies on how to keep us safe? What are your responsible deployment policies (RDPs)? Except that they call them Responsible Scaling Policies (RSPs) instead.
I deliberately say deployment rather than scaling. No one has shown what I would consider close to a responsible scaling policy in terms of what models they are willing to scale and train.
Anthropic at least does however seem to have something approaching a future responsible deployment policy, in terms of how to give people access to a model if we assume it is safe for the model to exist at all and for us to run tests on it. And we have also seen plausibly reasonable past deployment decisions from OpenAI regarding GPT-4 and earlier models, with extensive and expensive and slow red teaming including prototypes of ARC (they just changed names to METR, but I will call them ARC for this post) evaluations.
I also would accept as alternative names any of Scaling Policies (SPs), AGI Scaling Policies (ASPs) or even Conditional Pause Commitments (CPCs).
For existing models we know about, the danger lies entirely in deployment. That will change over time.
I am far from alone in my concern over the name, here is another example:
Oliver Habryka: A good chunk of my concerns about RSPs are specific concerns about the term "Responsible Scaling Policy".
I also feel like there is a disconnect and a bit of a Motte-and-Bailey going on where we have like one real instance of an RSP, in the form of the Anthropic RSP, and then some people from ARC Evals who have I feel like more of a model of some platonic ideal of an RSP, and I feel like they are getting conflated a bunch.
I do really feel like the term "Responsible Scaling Policy" clearly invokes a few things which I think are not true:
How fast you "scale" is the primary thing that matters for acting responsibly with AI
It is clearly possible to scale responsibly (otherwise what would the policy govern)
The default trajectory of an AI research organization should be to continue scaling
ARC evals defines an RSP this way:
An RSP specifies what level of AI capabilities an AI developer is prepared to handle safely with their current protective measures, and conditions under which it would be too dangerous to continue deploying AI systems and/or scaling up AI capabilities until protective measures improve.
I agree with Oliver that this paragraph should include be modified to 'claims they are prepared to handle' and 'they claim it would be too dangerous.' This is an important nitpik.
Nate Sores has thoughts on what the UK asked for, which could be summarized as 'mostly good things, better than nothing, obviously not enough' and of course it was never going to be enough and also Nate Sores is the world's toughest crowd.
How the UK Graded the Responses
How did various companies do on the requests? Here is how the UK graded them.
That is what you get if you were grading on a curve one answer at a time.
Reality does not grade on a curve. Nor is one question at a time the best method.
My own analysis, and others I trust, agree that this relatively underrates OpenAI, who clearly had the second best set of policies by a substantial margin, with one source even putting them on par with Anthropic, although I disagree with that. Otherwise the relative rankings seem correct.
Looking in detail, what to make of the responses? That will be the next few sections.
Answers ranged from Anthropic's att...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings