May 26, 2026

Learning Rates Regulate Catastrophic Overtraining

20 minutes

In this episode:
• Introduction to Catastrophic Overtraining: Linda and Professor Norris introduce the paper and the counterintuitive phenomenon where better pretraining leads to worse catastrophic forgetting.
• Feature Drift and Optimization Regimes: The hosts discuss how the supervised finetuning learning rate acts as an implicit regularizer, introducing the Mean Principal Angle to measure feature drift.
• Sharpness and the Edge of Stability: Linda connects the mystery of overtraining to pretraining learning rate decay, explaining how model sharpness amplifies the finetuning learning rate.
• Practical Takeaways for LLM Training: Professor Norris and Linda summarize the actionable advice from the paper, including lowering SFT learning rates and rethinking pretraining schedules.

...more

View all episodes

By Mechanical Dirk

May 26, 2026

Learning Rates Regulate Catastrophic Overtraining

20 minutes

...more

Share Learning Rates Regulate Catastrophic Overtraining

Sign up to save your podcasts

Learning Rates Regulate Catastrophic Overtraining

Learning Rates Regulate Catastrophic Overtraining