
Sign up to save your podcasts
Or


Without countermeasures, a scheming AI could escape.
A safety case (for deployment) is an argument that it is safe to deploy a particular AI system in a particular way.[1] For any existing LM-based system, the developer can make a great safety case by demonstrating that the system does not have dangerous capabilities.[2] No dangerous capabilities is a straightforward and solid kind of safety case. For future systems with dangerous capabilities, a safety case will require an argument that the system is safe to deploy despite those dangerous capabilities. In this post, I discuss the safety cases that the labs are currently planning to make, note that they ignore an important class of threats—namely threats from scheming AI escaping—and briefly discuss and recommend control-based safety cases.
1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
Four documents both (a) are endorsed by [...]
---
Outline:
(00:53) 1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
(03:32) 2. Scheming AI and escape during internal deployment
(07:41) 3. Control techniques and control-based safety cases
The original text contained 19 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongWithout countermeasures, a scheming AI could escape.
A safety case (for deployment) is an argument that it is safe to deploy a particular AI system in a particular way.[1] For any existing LM-based system, the developer can make a great safety case by demonstrating that the system does not have dangerous capabilities.[2] No dangerous capabilities is a straightforward and solid kind of safety case. For future systems with dangerous capabilities, a safety case will require an argument that the system is safe to deploy despite those dangerous capabilities. In this post, I discuss the safety cases that the labs are currently planning to make, note that they ignore an important class of threats—namely threats from scheming AI escaping—and briefly discuss and recommend control-based safety cases.
1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
Four documents both (a) are endorsed by [...]
---
Outline:
(00:53) 1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
(03:32) 2. Scheming AI and escape during internal deployment
(07:41) 3. Control techniques and control-based safety cases
The original text contained 19 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

113,393 Listeners

130 Listeners

7,268 Listeners

529 Listeners

16,306 Listeners

4 Listeners

14 Listeners

2 Listeners