LessWrong (30+ Karma)

“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes


Listen Later

2.1 Summary & Table of contents

This is the second of a two-post series on foom (previous post) and doom (this post).

The last post talked about how I expect future AI to be different from present AI. This post will argue that this future AI will be of a type that will be egregiously misaligned and scheming, not even ‘slightly nice’, absent some future conceptual breakthrough.

I will particularly focus on exactly how and why I differ from the LLM-focused researchers who wind up with (from my perspective) bizarrely over-optimistic beliefs like “P(doom) ≲ 50%”.[1]

In particular, I will argue that these “optimists” are right that “Claude seems basically nice, by and large” is nonzero evidence for feeling good about current LLMs (with various caveats). But I think that future AIs will be disanalogous to current LLMs, and I will dive into exactly how and why, with a [...]

---

Outline:

(00:12) 2.1 Summary & Table of contents

(04:42) 2.2 Background: my expected future AI paradigm shift

(06:18) 2.3 On the origins of egregious scheming

(07:03) 2.3.1 Where do you get your capabilities from?

(08:07) 2.3.2 LLM pretraining magically transmutes observations into behavior, in a way that is profoundly disanalogous to how brains work

(10:50) 2.3.3 To what extent should we think of LLMs as imitating?

(14:26) 2.3.4 The naturalness of egregious scheming: some intuitions

(19:23) 2.3.5 Putting everything together: LLMs are generally not scheming right now, but I expect future AI to be disanalogous

(23:41) 2.4 I'm still worried about the 'literal genie' / 'monkey's paw' thing

(26:58) 2.4.1 Sidetrack on disanalogies between the RLHF reward function and the brain-like AGI reward function

(32:01) 2.4.2 Inner and outer misalignment

(34:54) 2.5 Open-ended autonomous learning, distribution shifts, and the 'sharp left turn'

(38:14) 2.6 Problems with amplified oversight

(41:24) 2.7 Downstream impacts of Technical alignment is hard

(43:37) 2.8 Bonus: Technical alignment is not THAT hard

(44:04) 2.8.1 I think we'll get to pick the innate drives (as opposed to the evolution analogy)

(45:44) 2.8.2 I'm more bullish on impure consequentialism

(50:44) 2.8.3 On the narrowness of the target

(52:18) 2.9 Conclusion and takeaways

(52:23) 2.9.1 If brain-like AGI is so dangerous, shouldn't we just try to make AGIs via LLMs?

(54:34) 2.9.2 What's to be done?

The original text contained 20 footnotes which were omitted from this narration.

---

First published:

June 23rd, 2025

Source:

https://www.lesswrong.com/posts/bnnKGSCHJghAvqPjS/foom-and-doom-2-technical-alignment-is-hard

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,316 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,462 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

8,594 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,174 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

97 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,607 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,038 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

97 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

527 Listeners

Hard Fork by The New York Times

Hard Fork

5,527 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,022 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

566 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

134 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

92 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

473 Listeners