Share Midtraining Bridges Pretraining and Posttraining Distributions

Copy link

February 27, 2026

Midtraining Bridges Pretraining and Posttraining Distributions

16 minutes

In this episode:
• Introduction: Do We Really Need Another Phase?: Professor Norris jokingly laments the ever-expanding terminology of LLM training, while Linda introduces the paper on 'Midtraining' as a distinct, intermediate phase between pretraining and post-training.
• The Mechanism: Building a Distributional Bridge: Linda explains the core theory: midtraining isn't just 'cooling down,' but shifting the model's initialization closer to the target distribution to smooth out the optimization path.
• Results: Where It Works (and Where It Doesn't): The hosts discuss the finding that midtraining shines in 'distant' domains like Code and Math but matters less for general instructions, and cover the surprising reduction in catastrophic forgetting.
• The Plasticity Window: Timing and Mixtures: A deep dive into the interaction between when you start midtraining and how much specialized data you use, highlighting the dangers of late, aggressive data injection.
• Conclusion: Better Than Continued Pretraining?: Norris concedes the method's utility after seeing the comparison against standard continued pretraining, and the pair summarize the practical takeaways for training schedules.

...more

View all episodes

By Mechanical Dirk

February 27, 2026

Midtraining Bridges Pretraining and Posttraining Distributions

16 minutes

...more

Sign up to save your podcasts