March 31, 2026

Neural Neural Scaling Laws

19 minutes

In this episode:
• Introduction to Downstream Scaling Laws: Linda and Professor Norris introduce the paper and discuss the limitations of traditional parametric scaling laws for predicting downstream task performance.
• The Token-Level Secret: Linda explains how NeuNeu uses token-level probabilities instead of average validation loss to capture critical distributional signals.
• Architecture Deep Dive: The hosts break down the model components, detailing the CNN loss encoder and the Transformer time-series extrapolator using compute gaps.
• Results and Zero-Shot Generalization: Norris is won over by the 38 percent error reduction and NeuNeu's impressive ability to generalize to unseen models like the Pythia family.
• Ranking Models and Future Outlook: The episode concludes with a discussion on quantile regression, practical model ranking, and the dream of foundation models for training dynamics.

...more

View all episodes

By Mechanical Dirk

March 31, 2026

Neural Neural Scaling Laws

19 minutes

...more

Share Neural Neural Scaling Laws

Sign up to save your podcasts

Neural Neural Scaling Laws

Neural Neural Scaling Laws