
Sign up to save your podcasts
Or


“The resources used to train the model can be repurposed to run millions of instances of it (this matches projected cluster sizes by ~2027), and the model can absorb information and generate actions at roughly 10x-100x human speed. … We could summarize this as a ‘country of geniuses in a datacenter’.”
Dario Amodei, CEO of Anthropic, Machines of Loving Grace
“Let's say each copy of GPT-4 is producing 10 words per second. It turns out they would be able to run something like 300,000 copies of GPT-4 in parallel. And by the time they are training GPT-5 it will be a more extreme situation where just using the computer chips they used to train GPT-5, using them to kind of run copies of GPT-5 in parallel, you know, again, each producing 10 words per second, they’d be able to run 3 million copies of GPT-5 in parallel. And for [...]
---
Outline:
(02:28) Section I - The Question
(05:13) Section II - The Scenario
(10:54) Section III - Existing Estimates
(19:53) Section IV - Compute
(27:16) Section VI - Inference
(39:15) Section V - Human Equivalents
(45:38) Section VII - The Estimates
(46:04) Method 1: Total training to inference per token ratio
(50:40) Method 2: Flat inference costs
(53:39) Method 3: Human brain equivalent
(55:02) Method 4: Chip capabilities
(58:00) Method 5: Adjusting for capabilities per token
(59:01) Section VIII - Implications
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrong“The resources used to train the model can be repurposed to run millions of instances of it (this matches projected cluster sizes by ~2027), and the model can absorb information and generate actions at roughly 10x-100x human speed. … We could summarize this as a ‘country of geniuses in a datacenter’.”
Dario Amodei, CEO of Anthropic, Machines of Loving Grace
“Let's say each copy of GPT-4 is producing 10 words per second. It turns out they would be able to run something like 300,000 copies of GPT-4 in parallel. And by the time they are training GPT-5 it will be a more extreme situation where just using the computer chips they used to train GPT-5, using them to kind of run copies of GPT-5 in parallel, you know, again, each producing 10 words per second, they’d be able to run 3 million copies of GPT-5 in parallel. And for [...]
---
Outline:
(02:28) Section I - The Question
(05:13) Section II - The Scenario
(10:54) Section III - Existing Estimates
(19:53) Section IV - Compute
(27:16) Section VI - Inference
(39:15) Section V - Human Equivalents
(45:38) Section VII - The Estimates
(46:04) Method 1: Total training to inference per token ratio
(50:40) Method 2: Flat inference costs
(53:39) Method 3: Human brain equivalent
(55:02) Method 4: Chip capabilities
(58:00) Method 5: Adjusting for capabilities per token
(59:01) Section VIII - Implications
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,759 Listeners

130 Listeners

7,210 Listeners

531 Listeners

16,145 Listeners

4 Listeners

14 Listeners

2 Listeners