In this episode:
• The Death of Cosine?: Introduction to the episode and the paper. Professor Norris expresses his skepticism about changing established habits like Cosine Annealing, while Linda teases a shake-up in the status quo.
• Theory vs. Reality: A discussion on the massive gap between theoretical learning rates (like 1/t) and what practitioners actually use. Linda explains why the theory has historically failed to match practice.
• Linear Decay Takes the Crown: Linda explains the paper's core theoretical finding: that a simple linear decay is optimal for the last iterate of SGD, challenging the dominance of Cosine Decay.
• Refining the Schedule: Deep dive into the 'Refinement' technique where past gradient norms dictate the future schedule. Discussion on how 'warm-up' naturally emerges from the mathematics rather than being a heuristic hack.
• The Verdict and The Future: Reviewing the experimental results across Vision and LLMs. Final thoughts on whether practitioners should actually switch to Linear Decay or Refined schedules.