
Sign up to save your podcasts
Or


I originally wrote this as a private doc for people working in the field - it's not super polished, or optimized for a broad audience.
But I'm publishing anyway because inference-verification is a new and exciting area, and there aren't many birds-eye-view explainers of what's going on in it and what the bottlenecks are.
1. Summary
I think powerful AI will be obviously scary at some point, and companies or governments might want to slow it down to buy time for additional safety or oversight. Maybe this could be done quickly, e.g. by:
(Section 2)
Would these methods actually work? Or more specifically, if these methods were implemented quickly and correctly, would they substantially slow AI development?
I looked into this question for around a week, and here are my current views:
Current prototypes of inference-verification would probably be ineffective. Standard inference-verification measures slow training by restricting communication between servers (see Section 2), since training involves chucking big gradients around in a hivemind, and inference [...]
---
Outline:
(00:28) 1. Summary
(05:25) 2. Ways to quickly and cheaply slow training by restricting communication
(06:31) 2.1. Method #1: Disconnect inter-rack high-speed cables
(07:07) 2.2. Method #2: Tap-verified bandwidth limits
(08:33) 2.3. Method #3: Output re-computation
(11:34) 2.4. Method #4: Memory wipes
(13:20) 2.5. Method #5: Proof of work / proof of memory
(14:35) 3. Ways to efficiently continue training despite these constraints
(15:09) 3.1. Method #1: Larger batch size + infrequent SGD steps
(16:35) 3.2. Method #2: Periodically merge independent training runs
(18:40) 3.3. Method #3: Compress gradients and weights
(20:54) 3.4. Method #4: Use more compute for inference rollouts, and less for training
(24:16) 4. But more aggressive verification methods would probably make training with current algorithms impractical
(26:56) 5. However, if developers (or AIs) have a lot of time to research better algorithms, all bets are off
(29:50) 6. Conclusion
(30:13) Appendix
(30:16) Are we in the serially bottlenecked training regime? A BOTEC by Claude
(30:23) Setup
(31:13) Key Formula
(31:41) B_crit at Frontier Scale
(32:20) How Many GPUs Per Model Replica?
(32:48) Achievable Batch Size vs. B_crit
(33:22) Key Takeaways
(35:09) Caveats
(36:48) Sources
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongI originally wrote this as a private doc for people working in the field - it's not super polished, or optimized for a broad audience.
But I'm publishing anyway because inference-verification is a new and exciting area, and there aren't many birds-eye-view explainers of what's going on in it and what the bottlenecks are.
1. Summary
I think powerful AI will be obviously scary at some point, and companies or governments might want to slow it down to buy time for additional safety or oversight. Maybe this could be done quickly, e.g. by:
(Section 2)
Would these methods actually work? Or more specifically, if these methods were implemented quickly and correctly, would they substantially slow AI development?
I looked into this question for around a week, and here are my current views:
Current prototypes of inference-verification would probably be ineffective. Standard inference-verification measures slow training by restricting communication between servers (see Section 2), since training involves chucking big gradients around in a hivemind, and inference [...]
---
Outline:
(00:28) 1. Summary
(05:25) 2. Ways to quickly and cheaply slow training by restricting communication
(06:31) 2.1. Method #1: Disconnect inter-rack high-speed cables
(07:07) 2.2. Method #2: Tap-verified bandwidth limits
(08:33) 2.3. Method #3: Output re-computation
(11:34) 2.4. Method #4: Memory wipes
(13:20) 2.5. Method #5: Proof of work / proof of memory
(14:35) 3. Ways to efficiently continue training despite these constraints
(15:09) 3.1. Method #1: Larger batch size + infrequent SGD steps
(16:35) 3.2. Method #2: Periodically merge independent training runs
(18:40) 3.3. Method #3: Compress gradients and weights
(20:54) 3.4. Method #4: Use more compute for inference rollouts, and less for training
(24:16) 4. But more aggressive verification methods would probably make training with current algorithms impractical
(26:56) 5. However, if developers (or AIs) have a lot of time to research better algorithms, all bets are off
(29:50) 6. Conclusion
(30:13) Appendix
(30:16) Are we in the serially bottlenecked training regime? A BOTEC by Claude
(30:23) Setup
(31:13) Key Formula
(31:41) B_crit at Frontier Scale
(32:20) How Many GPUs Per Model Replica?
(32:48) Achievable Batch Size vs. B_crit
(33:22) Key Takeaways
(35:09) Caveats
(36:48) Sources
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

112,192 Listeners

131 Listeners

7,227 Listeners

564 Listeners

16,195 Listeners

4 Listeners

14 Listeners

2 Listeners