
Sign up to save your podcasts
Or


Without countermeasures, a scheming AI could escape.
A safety case (for deployment) is an argument that it is safe to deploy a particular AI system in a particular way.[1] For any existing LM-based system, the developer can make a great safety case by demonstrating that the system does not have dangerous capabilities.[2] No dangerous capabilities is a straightforward and solid kind of safety case. For future systems with dangerous capabilities, a safety case will require an argument that the system is safe to deploy despite those dangerous capabilities. In this post, I discuss the safety cases that the labs are currently planning to make, note that they ignore an important class of threats—namely threats from scheming AI escaping—and briefly discuss and recommend control-based safety cases.
1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
Four documents both (a) are endorsed by [...]
---
Outline:
(00:53) 1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
(03:32) 2. Scheming AI and escape during internal deployment
(07:41) 3. Control techniques and control-based safety cases
The original text contained 19 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongWithout countermeasures, a scheming AI could escape.
A safety case (for deployment) is an argument that it is safe to deploy a particular AI system in a particular way.[1] For any existing LM-based system, the developer can make a great safety case by demonstrating that the system does not have dangerous capabilities.[2] No dangerous capabilities is a straightforward and solid kind of safety case. For future systems with dangerous capabilities, a safety case will require an argument that the system is safe to deploy despite those dangerous capabilities. In this post, I discuss the safety cases that the labs are currently planning to make, note that they ignore an important class of threats—namely threats from scheming AI escaping—and briefly discuss and recommend control-based safety cases.
1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
Four documents both (a) are endorsed by [...]
---
Outline:
(00:53) 1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse
(03:32) 2. Scheming AI and escape during internal deployment
(07:41) 3. Control techniques and control-based safety cases
The original text contained 19 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

26,348 Listeners

2,456 Listeners

8,494 Listeners

4,170 Listeners

95 Listeners

1,612 Listeners

9,971 Listeners

95 Listeners

517 Listeners

5,512 Listeners

15,833 Listeners

554 Listeners

130 Listeners

92 Listeners

473 Listeners