
Sign up to save your podcasts
Or


[Subtitle.] Why I'm moderately bearish in the short term, and explosively bullish in the long term
This is a crosspost for Thoughts on AI progress (Dec 2025) by Dwarkesh Patel, which was originally published on Dwarkesh Podcast on 2 December 2025.
What are we scaling?
I’m confused why some people have short timelines and at the same time are bullish on the current scale up of reinforcement learning atop LLMs. If we’re actually close to a human-like learner, this whole approach of training on verifiable outcomes is doomed.
Currently the labs are trying to bake in a bunch of skills into these models through “mid-training” - there's an entire supply chain of companies building RL environments which teach the model how to navigate a web browser or use Excel to write financial models. [Relatedly, see Epoch AI's post An FAQ on Reinforcement Learning Environments.]
Either these models will soon learn on the job in a self directed way - making all this pre-baking pointless - or they won’t - which means AGI is not imminent. Humans don’t have to go through a special training phase where they need to rehearse every single piece of software [...]
---
Outline:
(00:33) What are we scaling?
(04:31) Human labor is valuable precisely because it's not shleppy to train
(06:24) Economic diffusion lag is cope for missing capabilities
(08:19) Goal post shifting is justified
(09:58) RL scaling is laundering the prestige of pretraining scaling
(10:52) Comparison to human distribution will make us at first overestimate (and then underestimate) AI
(11:45) Broadly deployed intelligence explosion
---
First published:
Source:
Linkpost URL:
https://www.dwarkesh.com/p/thoughts-on-ai-progress-dec-2025
---
Narrated by TYPE III AUDIO.
By EA Forum Team[Subtitle.] Why I'm moderately bearish in the short term, and explosively bullish in the long term
This is a crosspost for Thoughts on AI progress (Dec 2025) by Dwarkesh Patel, which was originally published on Dwarkesh Podcast on 2 December 2025.
What are we scaling?
I’m confused why some people have short timelines and at the same time are bullish on the current scale up of reinforcement learning atop LLMs. If we’re actually close to a human-like learner, this whole approach of training on verifiable outcomes is doomed.
Currently the labs are trying to bake in a bunch of skills into these models through “mid-training” - there's an entire supply chain of companies building RL environments which teach the model how to navigate a web browser or use Excel to write financial models. [Relatedly, see Epoch AI's post An FAQ on Reinforcement Learning Environments.]
Either these models will soon learn on the job in a self directed way - making all this pre-baking pointless - or they won’t - which means AGI is not imminent. Humans don’t have to go through a special training phase where they need to rehearse every single piece of software [...]
---
Outline:
(00:33) What are we scaling?
(04:31) Human labor is valuable precisely because it's not shleppy to train
(06:24) Economic diffusion lag is cope for missing capabilities
(08:19) Goal post shifting is justified
(09:58) RL scaling is laundering the prestige of pretraining scaling
(10:52) Comparison to human distribution will make us at first overestimate (and then underestimate) AI
(11:45) Broadly deployed intelligence explosion
---
First published:
Source:
Linkpost URL:
https://www.dwarkesh.com/p/thoughts-on-ai-progress-dec-2025
---
Narrated by TYPE III AUDIO.