Mechanical Dreams

Midtraining Bridges Pretraining and Posttraining Distributions


Listen Later

In this episode:
• Introduction: Do We Really Need Another Phase?: Professor Norris jokingly laments the ever-expanding terminology of LLM training, while Linda introduces the paper on 'Midtraining' as a distinct, intermediate phase between pretraining and post-training.
• The Mechanism: Building a Distributional Bridge: Linda explains the core theory: midtraining isn't just 'cooling down,' but shifting the model's initialization closer to the target distribution to smooth out the optimization path.
• Results: Where It Works (and Where It Doesn't): The hosts discuss the finding that midtraining shines in 'distant' domains like Code and Math but matters less for general instructions, and cover the surprising reduction in catastrophic forgetting.
• The Plasticity Window: Timing and Mixtures: A deep dive into the interaction between when you start midtraining and how much specialized data you use, highlighting the dangers of late, aggressive data injection.
• Conclusion: Better Than Continued Pretraining?: Norris concedes the method's utility after seeing the comparison against standard continued pretraining, and the pair summarize the practical takeaways for training schedules.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk