Share Optimal Linear Decay Learning Rate Schedules and Further Refinements

Copy link

March 11, 2026

Optimal Linear Decay Learning Rate Schedules and Further Refinements

18 minutes

In this episode:
• The Death of Cosine?: Introduction to the episode and the paper. Professor Norris expresses his skepticism about changing established habits like Cosine Annealing, while Linda teases a shake-up in the status quo.
• Theory vs. Reality: A discussion on the massive gap between theoretical learning rates (like 1/t) and what practitioners actually use. Linda explains why the theory has historically failed to match practice.
• Linear Decay Takes the Crown: Linda explains the paper's core theoretical finding: that a simple linear decay is optimal for the last iterate of SGD, challenging the dominance of Cosine Decay.
• Refining the Schedule: Deep dive into the 'Refinement' technique where past gradient norms dictate the future schedule. Discussion on how 'warm-up' naturally emerges from the mathematics rather than being a heuristic hack.
• The Verdict and The Future: Reviewing the experimental results across Vision and LLMs. Final thoughts on whether practitioners should actually switch to Linear Decay or Refined schedules.

...more

View all episodes

By Mechanical Dirk

March 11, 2026

Optimal Linear Decay Learning Rate Schedules and Further Refinements

18 minutes

...more

Sign up to save your podcasts