LessWrong (30+ Karma)

“Automation collapse” by Geoffrey Irving, Tomek Korbak, Benjamin Hilton


Listen Later

Summary: If we validate automated alignment research through empirical testing, the safety assurance work will still need to be done by humans, and will be similar to that needed for human-written alignment algorithms.

Three levels of automated AI safety

Automating AI safety means developing some algorithm which takes in data and outputs safe, highly-capable AI systems. Let's imagine three ways of developing this algorithm:

  1. Human-written algorithm, AI details: Humans write down an overall AI safety algorithm, and use AI systems to fill in a bunch of the details. The humans are confident the details provided by the AI systems don’t compromise the safety of the algorithm. This category includes scalable oversight, semi-automated interpretability (LLMs explain each neuron or SAE feature), and using LLMs for scaled formalisation of a spec.
  2. AI-written algorithm, checked empirically: The humans might have some rough idea what overall scheme is good, but the AI is going to [...]

---

Outline:

(00:19) Three levels of automated AI safety

(03:09) A model of empirically checked automated safety

(07:18) We should understand if the collapsed alignment scheme is safe

(09:45) We should ask if less automation is possible

(12:11) Checking the algorithms avoids collapse, but is hard

(14:11) Acknowledgements

The original text contained 1 image which was described by AI.

---

First published:

October 21st, 2024

Source:

https://www.lesswrong.com/posts/2Gy9tfjmKwkYbF9BY/automation-collapse

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

113,159 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,263 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

530 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,379 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners