Share Predictable Scale

Copy link

January 27, 2026

Predictable Scale

17 minutes

In this episode:
• Introduction: The Alchemy of Training: Professor Norris laments the 'black magic' of hyperparameter tuning, and Linda introduces the paper 'Predictable Scale: Part I, Step Law' which promises to turn that alchemy into science.
• The Million-Hour Experiment: The hosts discuss the unprecedented scale of the study, involving 3,700 models and nearly one million H800 GPU hours, to map the loss landscape.
• Defining the Step Law: Linda explains the core mathematical findings: how Learning Rate scales with model size (N) and data size (D), and the surprising revelation that optimal Batch Size depends almost entirely on D, not N.
• Universality: MoEs and Data Recipes: A deep dive into how the Step Law holds up against sparse Mixture-of-Experts models and varying data distributions (like code or multilingual data), outperforming previous scaling laws like DeepSeek or OpenAI's.
• Conclusion: A Plug-and-Play Future: Norris concedes that the empirical evidence is overwhelming. They wrap up with the implications for efficient LLM training and what this means for the industry.

...more

View all episodes

By Mechanical Dirk

January 27, 2026

Predictable Scale

17 minutes

...more

Sign up to save your podcasts