LessWrong (30+ Karma)

“Can governments quickly and cheaply slow AI training?” by joshc


Listen Later

I originally wrote this as a private doc for people working in the field - it's not super polished, or optimized for a broad audience.

But I'm publishing anyway because inference-verification is a new and exciting area, and there aren't many birds-eye-view explainers of what's going on in it and what the bottlenecks are.

1. Summary

I think powerful AI will be obviously scary at some point, and companies or governments might want to slow it down to buy time for additional safety or oversight. Maybe this could be done quickly, e.g. by:

  1. Unplugging inter-server cables to slow gradient syncs
  2. Limiting bandwidth with simple devices
  3. Periodically erasing clusters to delete covert training checkpoints
  4. Recomputing a sample of outputs to confirm they are, in fact, inference generations

(Section 2)

Would these methods actually work? Or more specifically, if these methods were implemented quickly and correctly, would they substantially slow AI development?

I looked into this question for around a week, and here are my current views:

Current prototypes of inference-verification would probably be ineffective. Standard inference-verification measures slow training by restricting communication between servers (see Section 2), since training involves chucking big gradients around in a hivemind, and inference [...]

---

Outline:

(00:28) 1. Summary

(05:25) 2. Ways to quickly and cheaply slow training by restricting communication

(06:31) 2.1. Method #1: Disconnect inter-rack high-speed cables

(07:07) 2.2. Method #2: Tap-verified bandwidth limits

(08:33) 2.3. Method #3: Output re-computation

(11:34) 2.4. Method #4: Memory wipes

(13:20) 2.5. Method #5: Proof of work / proof of memory

(14:35) 3. Ways to efficiently continue training despite these constraints

(15:09) 3.1. Method #1: Larger batch size + infrequent SGD steps

(16:35) 3.2. Method #2: Periodically merge independent training runs

(18:40) 3.3. Method #3: Compress gradients and weights

(20:54) 3.4. Method #4: Use more compute for inference rollouts, and less for training

(24:16) 4. But more aggressive verification methods would probably make training with current algorithms impractical

(26:56) 5. However, if developers (or AIs) have a lot of time to research better algorithms, all bets are off

(29:50) 6. Conclusion

(30:13) Appendix

(30:16) Are we in the serially bottlenecked training regime? A BOTEC by Claude

(30:23) Setup

(31:13) Key Formula

(31:41) B_crit at Frontier Scale

(32:20) How Many GPUs Per Model Replica?

(32:48) Achievable Batch Size vs. B_crit

(33:22) Key Takeaways

(35:09) Caveats

(36:48) Sources

---

First published:

March 7th, 2026

Source:

https://www.lesswrong.com/posts/Xzf3eMnhTko7AxnEy/can-governments-quickly-and-cheaply-slow-ai-training

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,192 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,227 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,195 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners