
Sign up to save your podcasts
Or
It'll take until ~2050 to repeat the level of scaling that pretraining compute is experiencing this decade, as increasing funding can't sustain the current pace beyond ~2029 if AI doesn't deliver a transformative commercial success by then. Natural text data will also run out around that time, and there are signs that current methods of reasoning training might be mostly eliciting capabilities from the base model.
If scaling of reasoning training doesn't bear out actual creation of new capabilities that are sufficiently general, and pretraining at ~2030 levels of compute together with the low hanging fruit of scaffolding doesn't bring AI to crucial capability thresholds, then it might take a while. Possibly decades, since training compute will be growing 3x-4x slower after 2027-2029 than it does now, and the ~6 years of scaling since the ChatGPT moment stretch to 20-25 subsequent years, not even having access to any [...]
---
Outline:
(01:14) Training Compute Slowdown
(04:43) Bounded Potential of Thinking Training
(07:43) Data Inefficiency of MoE
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
It'll take until ~2050 to repeat the level of scaling that pretraining compute is experiencing this decade, as increasing funding can't sustain the current pace beyond ~2029 if AI doesn't deliver a transformative commercial success by then. Natural text data will also run out around that time, and there are signs that current methods of reasoning training might be mostly eliciting capabilities from the base model.
If scaling of reasoning training doesn't bear out actual creation of new capabilities that are sufficiently general, and pretraining at ~2030 levels of compute together with the low hanging fruit of scaffolding doesn't bring AI to crucial capability thresholds, then it might take a while. Possibly decades, since training compute will be growing 3x-4x slower after 2027-2029 than it does now, and the ~6 years of scaling since the ChatGPT moment stretch to 20-25 subsequent years, not even having access to any [...]
---
Outline:
(01:14) Training Compute Slowdown
(04:43) Bounded Potential of Thinking Training
(07:43) Data Inefficiency of MoE
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,367 Listeners
2,397 Listeners
7,779 Listeners
4,103 Listeners
87 Listeners
1,442 Listeners
8,778 Listeners
89 Listeners
355 Listeners
5,370 Listeners
15,053 Listeners
460 Listeners
126 Listeners
64 Listeners
432 Listeners