Get ready for a rematch with the one & only Bentham’s Bulldog, a.k.a. Matthew Adelstein! Our first debate covered a wide range of philosophical topics.
Today’s Debate #2 is all about Matthew’s new argument against the inevitability of AI doom. He comes out swinging with a calculated P(Doom) of just 2.6% , based on a multi-step probability chain that I challenge as potentially falling into a “Type 2 Conjunction Fallacy” (a.k.a. Multiple Stage Fallacy).
We clash on whether to expect “alignment by default” and the nature of future AI architectures. While Matthew sees current RLHF success as evidence that AIs will likely remain compliant, I argue that we’re building “Goal Engines” — superhuman optimization modules that act like nuclear cores wrapped in friendly personalities. We debate whether these engines can be safely contained, or if the capability to map goals to actions is inherently dangerous and prone to exfiltration.
Despite our different forecasts (my 50% vs his sub-10%), we actually land in the “sane zone” together on some key policy ideas, like the potential necessity of a global pause.
While Matthew’s case for low P(Doom) hasn’t convinced me, I consider his post and his engagement with me to be super high quality and good faith. We’re not here to score points, we just want to better predict how the intelligence explosion will play out.
Timestamps
00:00:00 — Teaser
00:00:35 — Bentham’s Bulldog Returns to Doom Debates
00:05:43 — Higher-Order Evidence: Why Skepticism is Warranted
00:11:06 — What’s Your P(Doom)™
00:14:38 — The “Multiple Stage Fallacy” Objection
00:21:48 — The Risk of Warring AIs vs. Misalignment
00:27:29 — Historical Pessimism: The “Boy Who Cried Wolf”
00:33:02 — Comparing AI Risk to Climate Change & Nuclear War
00:38:59 — Alignment by Default via Reinforcement Learning
00:46:02 — The “Goal Engine” Hypothesis
00:53:13 — Is Psychoanalyzing Current AI Valid for Future Systems?
01:00:17 — Winograd Schemas & The Fragility of Value
01:09:15 — The Nuclear Core Analogy: Dangerous Engines in Friendly Wrappers
01:16:16 — The Discontinuity of Unstoppable AI
01:23:53 — Exfiltration: Running Superintelligence on a Laptop
01:31:37 — Evolution Analogy: Selection Pressures for Alignment
01:39:08 — Commercial Utility as a Force for Constraints
01:46:34 — Can You Isolate the “Goal-to-Action” Module?
01:54:15 — Will Friendly Wrappers Successfully Control Superhuman Cores?
02:04:01 — Moral Realism and Missing Out on Cosmic Value
02:11:44 — The Paradox of AI Solving the Alignment Problem
02:19:11 — Policy Agreements: Global Pauses and China
02:26:11 — Outro: PauseCon DC 2026 Promo
Links
Bentham’s Bulldog Official Substack — https://benthams.substack.com
The post we debated — https://benthams.substack.com/p/against-if-anyone-builds-it-everyone
Apply to PauseCon DC 2026 here or via https://pauseai-us.org
Forethought Institute’s paper: Preparing for the Intelligence Explosion
Tom Davidson (Forethought Institute)’s post: How quick and big would a software intelligence explosion be?
Scott Alexander on the Coffeepocalypse Argument
---
Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.
Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏
Get full access to Doom Debates at lironshapira.substack.com/subscribe