November 14, 2025

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

2 hours 24 minutes

Is focusing on corrigibility our best shot at getting to ASI alignment?

Max Harms and Jeremy Gillen are current and former MIRI alignment researchers who both see superintelligent AI as an imminent extinction threat, but disagree about Max's proposal of Corrigibility as Singular Target (CAST).

Max thinks focusing on corrigibility is the most plausible path to build ASI without losing control and dying, while Jeremy is skeptical that attempting CAST would lead to better superintelligent AI behavior on a sufficiently early try.

We recorded a friendly debate to understand the crux of Max and Jeremy's disagreement. The conversation also doubles as a way to learn about Max's Corrigibility As Singular Target proposal.

Video

Podcast

Listen on Spotify, import the RSS feed, or search "Doom Debates" in your podcast player.

Plus: Max's New Book, Red Heart

Max just published Red Heart, a realistic sci-fi thriller that brings the corrigibility problem to life through a high-stakes Chinese government AI project.

I thoroughly enjoyed reading it and highly recommend it! The last 20 minutes of my conversation with Max are all about Red Heart.

Transcript

Episode Preview

Max Harms 00:00:00
If you mess up real bad, this thing goes and eats [...]

---

Outline:

(00:14) Is focusing on corrigibility our best shot at getting to ASI alignment?

(01:08) Video

(01:14) Podcast

(01:24) Plus: Maxs New Book, Red Heart

(01:55) Transcript

(01:58) Episode Preview

(13:32) Why Corrigibility Matters

(15:20) What's Your P(Doom)™

(20:42) Max's Case for Corrigibility

(23:46) Jeremy's Case Against Corrigibility

(26:21) Max's Mainline AI Scenario

(32:44) 4. Strategies: Alignment, Control, Corrigibility, Don't Build It

(41:53) Corrigibility vs HHH (Helpful, Harmless, Honest)

(47:31) Asimov's 3 Laws of Robotics

(52:45) Is Corrigibility a Coherent Concept?

(01:03:21) Corrigibility vs Shutdown-ability

(01:09:21) CAST: Corrigibility as Singular Target, Near Misses, Iterations

(01:19:20) Debating if Max is Over-Optimistic

(01:32:46) Debating if Corrigibility is the Best Target

(01:39:58) Would Max Work for Anthropic?

(01:42:37) Max's Modest Hopes

(02:02:35) Max's New Book: Red Heart

(02:21:52) Outro

---

First published:

November 14th, 2025

Source:

https://www.lesswrong.com/posts/CsXAg8dHSgghDAoPx/ai-corrigibility-debate-max-harms-vs-jeremy-gillen

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By LessWrong

November 14, 2025

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

2 hours 24 minutes

Is focusing on corrigibility our best shot at getting to ASI alignment?

Video

Podcast

Listen on Spotify, import the RSS feed, or search "Doom Debates" in your podcast player.

Plus: Max's New Book, Red Heart

Max just published Red Heart, a realistic sci-fi thriller that brings the corrigibility problem to life through a high-stakes Chinese government AI project.

I thoroughly enjoyed reading it and highly recommend it! The last 20 minutes of my conversation with Max are all about Red Heart.

Transcript

Episode Preview

Max Harms 00:00:00
If you mess up real bad, this thing goes and eats [...]

---

Outline:

(00:14) Is focusing on corrigibility our best shot at getting to ASI alignment?

(01:08) Video

(01:14) Podcast

(01:24) Plus: Maxs New Book, Red Heart

(01:55) Transcript

(01:58) Episode Preview

(13:32) Why Corrigibility Matters

(15:20) What's Your P(Doom)™

(20:42) Max's Case for Corrigibility

(23:46) Jeremy's Case Against Corrigibility

(26:21) Max's Mainline AI Scenario

(32:44) 4. Strategies: Alignment, Control, Corrigibility, Don't Build It

(41:53) Corrigibility vs HHH (Helpful, Harmless, Honest)

(47:31) Asimov's 3 Laws of Robotics

(52:45) Is Corrigibility a Coherent Concept?

(01:03:21) Corrigibility vs Shutdown-ability

(01:09:21) CAST: Corrigibility as Singular Target, Near Misses, Iterations

(01:19:20) Debating if Max is Over-Optimistic

(01:32:46) Debating if Corrigibility is the Best Target

(01:39:58) Would Max Work for Anthropic?

(01:42:37) Max's Modest Hopes

(02:02:35) Max's New Book: Red Heart

(02:21:52) Outro

---

First published:

November 14th, 2025

Source:

https://www.lesswrong.com/posts/CsXAg8dHSgghDAoPx/ai-corrigibility-debate-max-harms-vs-jeremy-gillen

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

View all

Making Sense with Sam Harris

26,330 Listeners

Conversations with Tyler

2,453 Listeners

The Peter Attia Drive

8,557 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,182 Listeners

ManifoldOne

93 Listeners

Your Undivided Attention

1,601 Listeners

All-In with Chamath, Jason, Sacks & Friedberg

9,927 Listeners

Machine Learning Street Talk (MLST)

95 Listeners

Dwarkesh Podcast

511 Listeners

Hard Fork

5,512 Listeners

The Ezra Klein Show

15,931 Listeners

Moonshots with Peter Diamandis

545 Listeners

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

Latent Space: The AI Engineer Podcast

94 Listeners

BG2Pod with Brad Gerstner and Bill Gurley

467 Listeners

Share “AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

Sign up to save your podcasts

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris

Conversations with Tyler

The Peter Attia Drive

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

ManifoldOne

Your Undivided Attention

All-In with Chamath, Jason, Sacks & Friedberg

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

Hard Fork

The Ezra Klein Show

Moonshots with Peter Diamandis

No Priors: Artificial Intelligence | Technology | Startups

Latent Space: The AI Engineer Podcast

BG2Pod with Brad Gerstner and Bill Gurley