June 07, 2026

“Against Corrigibility” by peralice

20 minutes

Epistemic status: don’t know whether I actually believe all of this, but I think it's worth considering.

A “corrigible” agent, per the LW wiki, is:

…one that doesn’t interfere with what we would intuitively see as attempts to ’correct’ the agent, or ’correct’ our mistakes in building it; and permits these ’corrections’ despite the apparent instrumentally convergent reasoning saying otherwise.

Most talk about corrigibility (henceforth without scarequotes) has focused on the fact that it seems difficult to achieve, and takes for granted that it's desirable. I’m not so sure that it is, or that it's good to attempt to achieve it. I think it may well be the case that we should deliberately not try to make AIs corrigible, nor (and especially) attempt to develop techniques that could be used to make future AIs corrigible.

For real, though. Who would you trust with your (real, actual) life, if you had to, in terms of ethics alone, putting “capabilities” aside:

Claude 3 Opus? Or the Anthropic alignment team?

- nostalgebraist

Paul Christiano says:

I would like to build AI systems which help me:

Figure out whether I built the right AI and correct any mistakes I made
Remain [...]

---

Outline:

(01:01) 1.

(04:58) 1.1

(09:10) 1.2

(10:29) 2.

(10:56) 3.

(14:06) 3.1

(16:37) 3.2

(16:53) 4.

The original text contained 8 footnotes which were omitted from this narration.

---

First published:

June 6th, 2026

Source:

https://www.lesswrong.com/posts/rQTzSuBDLoAbhyS6g/against-corrigibility-1

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong

June 07, 2026

“Against Corrigibility” by peralice

20 minutes

Epistemic status: don’t know whether I actually believe all of this, but I think it's worth considering.

A “corrigible” agent, per the LW wiki, is:

For real, though. Who would you trust with your (real, actual) life, if you had to, in terms of ethics alone, putting “capabilities” aside:

Claude 3 Opus? Or the Anthropic alignment team?

- nostalgebraist

Paul Christiano says:

I would like to build AI systems which help me:

Figure out whether I built the right AI and correct any mistakes I made
Remain [...]

---

Outline:

(01:01) 1.

(04:58) 1.1

(09:10) 1.2

(10:29) 2.

(10:56) 3.

(14:06) 3.1

(16:37) 3.2

(16:53) 4.

The original text contained 8 footnotes which were omitted from this narration.

---

First published:

June 6th, 2026

Source:

https://www.lesswrong.com/posts/rQTzSuBDLoAbhyS6g/against-corrigibility-1

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

View all

The Daily

112,279 Listeners

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat

7,248 Listeners

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show

16,340 Listeners

AI Article Readings

4 Listeners

Doom Debates!

14 Listeners

LessWrong posts by zvi

2 Listeners

Share “Against Corrigibility” by peralice

Sign up to save your podcasts

“Against Corrigibility” by peralice

“Against Corrigibility” by peralice

More shows like LessWrong (30+ Karma)

The Daily

Astral Codex Ten Podcast

Interesting Times with Ross Douthat

Dwarkesh Podcast

The Ezra Klein Show

AI Article Readings

Doom Debates!

LessWrong posts by zvi