October 23, 2025

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

1 hour 9 minutes

Are we failing to understand the exponential, again?

My guest is Julian Schrittwieser (top AI researcher at Anthropic; previously Google DeepMind on AlphaGo Zero & MuZero). We unpack his viral post (“Failing to Understand the Exponential, again”) and what it looks like when task length doubles every 3–4 months—pointing to AI agents that can work a full day autonomously by 2026 and expert-level breadth by 2027. We talk about the original Move 37 moment and whether today’s AI models can spark alien insights in code, math, and science—including Julian’s timeline for when AI could produce Nobel-level breakthroughs.

We go deep on the recipe of the moment—pre-training + RL—why it took time to combine them, what “RL from scratch” gets right and wrong, and how implicit world models show up in LLM agents. Julian explains the current rewards frontier (human prefs, rubrics, RLVR, process rewards), what we know about compute & scaling for RL, and why most builders should start with tools + prompts before considering RL-as-a-service. We also cover evals & Goodhart’s law (e.g., GDP-Val vs real usage), the latest in mechanistic interpretability (think “Golden Gate Claude”), and how safety & alignment actually surface in Anthropic’s launch process.

Finally, we zoom out: what 10× knowledge-work productivity could unlock across medicine, energy, and materials, how jobs adapt (complementarity over 1-for-1 replacement), and why the near term is likely a smooth ramp—fast, but not a discontinuity.

Julian Schrittwieser

Blog - https://www.julian.ac

X/Twitter - https://x.com/mononofu

Viral post: Failing to understand the exponential, again (9/27/2025)

Anthropic

Website - https://www.anthropic.com

X/Twitter - https://x.com/anthropicai

Matt Turck (Managing Director)

Blog - https://www.mattturck.com

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

(00:00) Cold open — “We’re not seeing any slowdown.”

(00:32) Intro — who Julian is & what we cover

(01:09) The “exponential” from inside frontier labs

(04:46) 2026–2027: agents that work a full day; expert-level breadth

(08:58) Benchmarks vs reality: long-horizon work, GDP-Val, user value

(10:26) Move 37 — what actually happened and why it mattered

(13:55) Novel science: AlphaCode/AlphaTensor → when does AI earn a Nobel?

(16:25) Discontinuity vs smooth progress (and warning signs)

(19:08) Does pre-training + RL get us there? (AGI debates aside)

(20:55) Sutton’s “RL from scratch”? Julian’s take

(23:03) Julian’s path: Google → DeepMind → Anthropic

(26:45) AlphaGo (learn + search) in plain English

(30:16) AlphaGo Zero (no human data)

(31:00) AlphaZero (one algorithm: Go, chess, shogi)

(31:46) MuZero (planning with a learned world model)

(33:23) Lessons for today’s agents: search + learning at scale

(34:57) Do LLMs already have implicit world models?

(39:02) Why RL on LLMs took time (stability, feedback loops)

(41:43) Compute & scaling for RL — what we see so far

(42:35) Rewards frontier: human prefs, rubrics, RLVR, process rewards

(44:36) RL training data & the “flywheel” (and why quality matters)

(48:02) RL & Agents 101 — why RL unlocks robustness

(50:51) Should builders use RL-as-a-service? Or just tools + prompts?

(52:18) What’s missing for dependable agents (capability vs engineering)

(53:51) Evals & Goodhart — internal vs external benchmarks

(57:35) Mechanistic interpretability & “Golden Gate Claude”

(1:00:03) Safety & alignment at Anthropic — how it shows up in practice

(1:03:48) Jobs: human–AI complementarity (comparative advantage)

(1:06:33) Inequality, policy, and the case for 10× productivity → abundance

(1:09:24) Closing thoughts

...more

View all episodes

By Matt Turck

2424 ratings

October 23, 2025

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

1 hour 9 minutes

Are we failing to understand the exponential, again?

Julian Schrittwieser

Blog - https://www.julian.ac

X/Twitter - https://x.com/mononofu

Viral post: Failing to understand the exponential, again (9/27/2025)

Anthropic

Website - https://www.anthropic.com

X/Twitter - https://x.com/anthropicai

Matt Turck (Managing Director)

Blog - https://www.mattturck.com

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

(00:00) Cold open — “We’re not seeing any slowdown.”

(00:32) Intro — who Julian is & what we cover

(01:09) The “exponential” from inside frontier labs

(04:46) 2026–2027: agents that work a full day; expert-level breadth

(08:58) Benchmarks vs reality: long-horizon work, GDP-Val, user value

(10:26) Move 37 — what actually happened and why it mattered

(13:55) Novel science: AlphaCode/AlphaTensor → when does AI earn a Nobel?

(16:25) Discontinuity vs smooth progress (and warning signs)

(19:08) Does pre-training + RL get us there? (AGI debates aside)

(20:55) Sutton’s “RL from scratch”? Julian’s take

(23:03) Julian’s path: Google → DeepMind → Anthropic

(26:45) AlphaGo (learn + search) in plain English

(30:16) AlphaGo Zero (no human data)

(31:00) AlphaZero (one algorithm: Go, chess, shogi)

(31:46) MuZero (planning with a learned world model)

(33:23) Lessons for today’s agents: search + learning at scale

(34:57) Do LLMs already have implicit world models?

(39:02) Why RL on LLMs took time (stability, feedback loops)

(41:43) Compute & scaling for RL — what we see so far

(42:35) Rewards frontier: human prefs, rubrics, RLVR, process rewards

(44:36) RL training data & the “flywheel” (and why quality matters)

(48:02) RL & Agents 101 — why RL unlocks robustness

(50:51) Should builders use RL-as-a-service? Or just tools + prompts?

(52:18) What’s missing for dependable agents (capability vs engineering)

(53:51) Evals & Goodhart — internal vs external benchmarks

(57:35) Mechanistic interpretability & “Golden Gate Claude”

(1:00:03) Safety & alignment at Anthropic — how it shows up in practice

(1:03:48) Jobs: human–AI complementarity (comparative advantage)

(1:06:33) Inequality, policy, and the case for 10× productivity → abundance

(1:09:24) Closing thoughts

...more

More shows like The MAD Podcast with Matt Turck

View all

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

534 Listeners

The a16z Show

1,094 Listeners

Invest Like the Best with Patrick O'Shaughnessy

2,362 Listeners

Azeem Azhar's Exponential View

614 Listeners

Y Combinator Startup Podcast

227 Listeners

All-In with Chamath, Jason, Sacks & Friedberg

10,024 Listeners

Machine Learning Street Talk (MLST)

95 Listeners

Dwarkesh Podcast

518 Listeners

Big Technology Podcast

500 Listeners

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

Latent Space: The AI Engineer Podcast

92 Listeners

AI + a16z

35 Listeners

Sharp Tech with Ben Thompson

95 Listeners

TBPN

121 Listeners

Uncapped with Jack Altman

42 Listeners

Share Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

Sign up to save your podcasts

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

More shows like The MAD Podcast with Matt Turck

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

The a16z Show

Invest Like the Best with Patrick O'Shaughnessy

Azeem Azhar's Exponential View

Y Combinator Startup Podcast

All-In with Chamath, Jason, Sacks & Friedberg

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

Big Technology Podcast

No Priors: Artificial Intelligence | Technology | Startups

Latent Space: The AI Engineer Podcast

AI + a16z

Sharp Tech with Ben Thompson

TBPN

Uncapped with Jack Altman