Share Scaling Laws for Precision

Copy link

March 19, 2026

Scaling Laws for Precision

20 minutes

In this episode:
• Introduction to Precision in Scaling Laws: Linda introduces the new paper which adds precision as a third variable to the Chinchilla scaling laws. Professor Norris reflects on how precision is usually treated as an afterthought.
• The Post-Training Quantization Paradox: The hosts discuss the surprising finding that overtraining models on too much data actually makes them degrade worse when applying post-training quantization.
• Effective Parameters and Low-Precision Training: Linda explains the concept of effective parameter count, and how lowering precision in weights, activations, and KV cache shrinks the model's effective size multiplicatively.
• Finding the Compute-Optimal Precision: Professor Norris is surprised to learn that compute-optimal pretraining precision is around 7 to 8 bits, completely independent of the compute budget unless model size is constrained.
• A Unified Scaling Law and Takeaways: The episode wraps up by bringing pretraining and post-training precision into a single mathematical framework, discussing what this means for the future of model training.

...more

View all episodes

By Mechanical Dirk

March 19, 2026

Scaling Laws for Precision

20 minutes

...more

Sign up to save your podcasts