Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dario Amodei's prepared remarks from the UK AI Safety Summit, on Anthropic's Responsible Scaling Policy, published by Zac Hatfield-Dodds on November 1, 2023 on The AI Alignment Forum.
I hope Dario's remarks to the Summit can shed some light on how we think about RSPs in general and Anthropic's RSP in particular, both of which have been discussed extensively since
I shared our RSP announcement
. The full text of Dario's remarks follows:
Before I get into Anthropic's
Responsible Scaling Policy (RSP)
, it's worth explaining some of the unique challenges around measuring AI risks that led us to develop our RSP. The most important thing to understand about AI is how quickly it is moving. A few years ago, AI systems could barely string together a coherent sentence. Today they can pass medical exams, write poetry, and tell jokes. This rapid progress is ultimately driven by the amount of available computation, which is growing by 8x per year and is unlikely to slow down in the next few years. The
general
trend of rapid improvement is predictable, however, it is actually very difficult to predict when AI will acquire
specific
skills or knowledge. This unfortunately includes
dangerous skills
, such as the ability to construct biological weapons. We are thus facing a number of potential AI-related threats which, although relatively limited given today's systems, are likely to become very serious at some unknown point in the near future. This is very different from most other industries: imagine if each new model of car had some chance of spontaneously sprouting a new (and dangerous) power, like the ability to fire a rocket boost or accelerate to supersonic speeds.
We need both a way to frequently monitor these emerging risks, and a protocol for responding appropriately when they occur. Responsible scaling policies - initially suggested by the Alignment Research Center - attempt to meet this need. Anthropic published its RSP in September, and was the first major AI company to do so. It has two major components:
First, we've come up with a system called AI safety levels (ASL), loosely modeled after the internationally recognized BSL system for handling biological materials. Each ASL level has an if-then structure: if an AI system exhibits certain dangerous capabilities, then we will not deploy it or train more powerful models, until certain safeguards are in place.
Second, we test frequently for these dangerous capabilities at regular intervals along the compute scaling curve. This is to ensure that we don't blindly create dangerous capabilities without even knowing we have done so.
In our system, ASL-1 represents models with little to no risk - for example a specialized AI that plays chess. ASL-2 represents where we are today: models that have a wide range of present-day risks, but do not yet exhibit truly dangerous capabilities that could lead to catastrophic outcomes if applied to fields like biology or chemistry. Our RSP requires us to implement present-day best practices for ASL-2 models, including model cards, external red-teaming, and strong security.
ASL-3 is the point at which AI models become operationally useful for catastrophic misuse in CBRN areas, as defined by experts in those fields and as compared to existing capabilities and proofs of concept. When this happens we require the following measures:
Unusually strong security measures such that non-state actors cannot steal the weights, and state actors would need to expend significant effort to do so.
Despite being (by definition)
inherently
capable of providing information that operationally increases CBRN risks, the deployed versions of our ASL-3 model must
never
produce such information, even when red-teamed by world experts in this area working together with AI engineers. This will require research breakthroughs...