Doom Debates

The Man Who Might SOLVE AI Alignment — Dr. Steven Byrnes, AGI Safety Researcher @ Astera Institute


Listen Later

Dr. Steven Byrnes, UC Berkeley physics PhD and Harvard physics postdoc, is an AI safety researcher at the Astera Institute and one of the most rigorous thinkers working on the technical AI alignment problem.

Steve has a whopping 90% P(Doom), but unlike most AI safety researchers who focus on current LLMs, he argues that LLMs will plateau before becoming truly dangerous, and the real threat will come from next-generation "brain-like AGI" based on actor-critic reinforcement learning.

For the last five years, he's been diving deep into neuroscience to reverse engineer how human brains actually work, and how to use that knowledge to solve the technical AI alignment problem. He's one of the few people who both understands why alignment is hard and is taking a serious technical shot at solving it.

We cover his "two subsystems" model of the brain, why current AI safety approaches miss the mark, his disagreements with social evolution approaches, and why understanding human neuroscience matters for building aligned AGI.

* 00:00:00 - Cold Open: Solving the technical alignment problem

* 00:00:26 - Introducing Dr. Steven Byrnes and his impressive background

* 00:01:59 - Steve's unique mental strengths

* 00:04:08 - The cold fusion research story demonstrating Steve's approach

* 00:06:18 - How Steve got interested in neuroscience through Jeff Hawkins

* 00:08:18 - Jeff Hawkins' cortical uniformity theory and brain vs deep learning

* 00:11:45 - When Steve first encountered Eliezer's sequences and became AGI-pilled

* 00:15:11 - Steve's research direction: reverse engineering human social instincts

* 00:21:47 - Four visions of alignment success and Steve's preferred approach

* 00:29:00 - The two brain subsystems model: steering brain vs learning brain

* 00:35:30 - Brain volume breakdown and the learning vs steering distinction

* 00:38:43 - Cerebellum as the "LLM" of the brain doing predictive learning

* 00:46:44 - Language acquisition: Chomsky vs learning algorithms debate

* 00:54:13 - What LLMs fundamentally can't do: complex context limitations

* 01:07:17 - Hypothalamus and brainstem doing more than just homeostasis

* 01:13:45 - Why morality might just be another hypothalamus cell group

* 01:18:00 - Human social instincts as model-based reinforcement learning

* 01:22:47 - Actor-critic reinforcement learning mapped to brain regions

* 01:29:33 - Timeline predictions: when brain-like AGI might arrive

* 01:38:28 - Why humans still beat AI on strategic planning and domain expertise

* 01:47:27 - Inner vs outer alignment: cocaine example and reward prediction

* 01:55:13 - Why legible Python code beats learned reward models

* 02:00:45 - Outcome pumps, instrumental convergence, and the Stalin analogy

* 02:11:48 - What’s Your P(Doom)™

* 02:16:45 - Massive headroom above human intelligence

* 02:20:45 - Can AI take over without physical actuators? (Yes)

* 02:26:18 - Steve's bold claim: 30 person-years from proto-AGI to superintelligence

* 02:32:17 - Why overhang makes the transition incredibly dangerous

* 02:35:00 - Social evolution as alignment solution: why it won't work

* 02:46:47 - Steve's research program: legible reward functions vs RLHF

* 02:59:52 - AI policy discussion: why Steven is skeptical of pause AI

* 03:05:51 - Lightning round: offense vs defense, P(simulation), AI unemployment

* 03:12:42 - Thanking Steve and wrapping up the conversation

* 03:13:30 - Liron's outro: Supporting the show and upcoming episodes with Vitalik and Eliezer

Show Notes

* Steven Byrnes' Website & Researchhttps://sjbyrnes.com/

* Steve’s Twitter — https://x.com/steve47285

* Astera Institutehttps://astera.org/

Steve’s Sequences

* Intro to Brain-Like-AGI Safetyhttps://www.alignmentforum.org/s/HzcM2dkCq7fwXBej8

* Foom & Doom 1: “Brain in a box in a basement”https://www.alignmentforum.org/posts/yew6zFWAKG4AGs3Wk/foom-and-doom-1-brain-in-a-box-in-a-basement

* Foom & Doom 2: Technical alignment is hard — https://www.alignmentforum.org/posts/bnnKGSCHJghAvqPjS/foom-and-doom-2-technical-alignment-is-hard

---

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit lironshapira.substack.com
...more
View all episodesView all episodes
Download on the App Store

Doom DebatesBy Liron Shapira

  • 4.1
  • 4.1
  • 4.1
  • 4.1
  • 4.1

4.1

9 ratings


More shows like Doom Debates

View all
Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,414 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

270 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

90 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

424 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

133 Listeners

"Moment of Zen" by Erik Torenberg, Dan Romero, Antonio Garcia Martinez

"Moment of Zen"

91 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

127 Listeners

Unsupervised Learning by by Redpoint Ventures

Unsupervised Learning

51 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

89 Listeners

"Upstream" with Erik Torenberg by Erik Torenberg

"Upstream" with Erik Torenberg

60 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

512 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

143 Listeners

For Humanity: An AI Safety Podcast by The AI Risk Network

For Humanity: An AI Safety Podcast

9 Listeners

Training Data by Sequoia Capital

Training Data

42 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

124 Listeners