June 03, 2024

“Companies’ safety plans neglect risks from scheming AI” by Zach Stein-Perlman

Listen Later

11 minutes

Without countermeasures, a scheming AI could escape.

A safety case (for deployment) is an argument that it is safe to deploy a particular AI system in a particular way.[1] For any existing LM-based system, the developer can make a great safety case by demonstrating that the system does not have dangerous capabilities.[2] No dangerous capabilities is a straightforward and solid kind of safety case. For future systems with dangerous capabilities, a safety case will require an argument that the system is safe to deploy despite those dangerous capabilities. In this post, I discuss the safety cases that the labs are currently planning to make, note that they ignore an important class of threats—namely threats from scheming AI escaping—and briefly discuss and recommend control-based safety cases.

1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse

Four documents both (a) are endorsed by [...]

---

Outline:

(00:53) 1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse

(03:32) 2. Scheming AI and escape during internal deployment

(07:41) 3. Control techniques and control-based safety cases

The original text contained 19 footnotes which were omitted from this narration.

---

First published:

June 3rd, 2024

Source:

https://www.lesswrong.com/posts/mmDJWDX5EXv6rymtM/companies-safety-plans-neglect-risks-from-scheming-ai

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

June 03, 2024

“Companies’ safety plans neglect risks from scheming AI” by Zach Stein-Perlman

Listen Later

11 minutes

Without countermeasures, a scheming AI could escape.

A safety case (for deployment) is an argument that it is safe to deploy a particular AI system in a particular way.[1] For any existing LM-based system, the developer can make a great safety case by demonstrating that the system does not have dangerous capabilities.[2] No dangerous capabilities is a straightforward and solid kind of safety case. For future systems with dangerous capabilities, a safety case will require an argument that the system is safe to deploy despite those dangerous capabilities. In this post, I discuss the safety cases that the labs are currently planning to make, note that they ignore an important class of threats—namely threats from scheming AI escaping—and briefly discuss and recommend control-based safety cases.

1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse

Four documents both (a) are endorsed by [...]

---

Outline:

(00:53) 1. Safety cases implicit in current safety plans: no dangerous capabilities and mitigations to prevent misuse

(03:32) 2. Scheming AI and escape during internal deployment

(07:41) 3. Control techniques and control-based safety cases

The original text contained 19 footnotes which were omitted from this narration.

---

First published:

June 3rd, 2024

Source:

https://www.lesswrong.com/posts/mmDJWDX5EXv6rymtM/companies-safety-plans-neglect-risks-from-scheming-ai

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

113,393 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,268 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

529 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,306 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners