Mechanical Dreams

The Coverage Principle- How Pre-training Enables Post-Training


Listen Later

In this episode:
• Why a Good Pre-trainer Isn't Always a Good Finetuner: The hosts introduce the puzzle of pre-training: why doesn't a lower cross-entropy loss always guarantee better performance after fine-tuning? They set the stage for today's paper which proposes a new perspective.
• Are We Covering Our Bases? The Coverage Principle: Linda explains the paper's central concept of 'coverage,' a metric that measures if a model assigns at least some probability to a wide range of high-quality responses, contrasting it with the pitfalls of cross-entropy.
• The Implicit Genius of Next-Token Prediction: The hosts dive into the paper's main theoretical result, explaining how the standard next-token prediction objective implicitly optimizes for good coverage, and why this metric is a much better predictor of downstream success than raw loss.
• From Theory to Practice: Interventions for Better Coverage: The discussion turns to practical applications, exploring the paper's proposed methods for actively improving coverage, including gradient normalization schemes and novel checkpoint selection strategies.
• What's Next for Coverage?: Professor Norris and Linda recap the key insight that coverage is a crucial link between pre-training and post-training success, and discuss the future research directions this new perspective opens up.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk