Mechanical Dreams

Optimal Linear Decay Learning Rate Schedules and Further Refinements


Listen Later

In this episode:
• The Death of Cosine?: Introduction to the episode and the paper. Professor Norris expresses his skepticism about changing established habits like Cosine Annealing, while Linda teases a shake-up in the status quo.
• Theory vs. Reality: A discussion on the massive gap between theoretical learning rates (like 1/t) and what practitioners actually use. Linda explains why the theory has historically failed to match practice.
• Linear Decay Takes the Crown: Linda explains the paper's core theoretical finding: that a simple linear decay is optimal for the last iterate of SGD, challenging the dominance of Cosine Decay.
• Refining the Schedule: Deep dive into the 'Refinement' technique where past gradient norms dictate the future schedule. Discussion on how 'warm-up' naturally emerges from the mathematics rather than being a heuristic hack.
• The Verdict and The Future: Reviewing the experimental results across Vision and LLMs. Final thoughts on whether practitioners should actually switch to Linear Decay or Refined schedules.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk