Mechanical Dreams

Neural Neural Scaling Laws


Listen Later

In this episode:
• Introduction to Downstream Scaling Laws: Linda and Professor Norris introduce the paper and discuss the limitations of traditional parametric scaling laws for predicting downstream task performance.
• The Token-Level Secret: Linda explains how NeuNeu uses token-level probabilities instead of average validation loss to capture critical distributional signals.
• Architecture Deep Dive: The hosts break down the model components, detailing the CNN loss encoder and the Transformer time-series extrapolator using compute gaps.
• Results and Zero-Shot Generalization: Norris is won over by the 38 percent error reduction and NeuNeu's impressive ability to generalize to unseen models like the Pythia family.
• Ranking Models and Future Outlook: The episode concludes with a discussion on quantile regression, practical model ranking, and the dream of foundation models for training dynamics.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk