Share Fantastic Pretraining Optimizers and Where to Find Them

Copy link

September 19, 2025

Fantastic Pretraining Optimizers and Where to Find Them

13 minutes

In this episode:
• The Optimizer Royal Rumble: Professor Norris and Linda introduce the chaotic landscape of LLM optimizers, where everyone claims to beat the reigning champion, AdamW. They introduce today's paper, which aims to be the referee in this messy fight.
• The Art of the Unfair Comparison: Linda explains the paper's core thesis: many new optimizers seem fast only because they are compared against poorly tuned baselines. Professor Norris agrees, highlighting the critical importance of fair hyperparameter tuning.
• Diminishing Returns and Shifting Allegiances: The hosts dive into the paper's main findings, discussing how the speedup of new optimizers shrinks with model size and how the 'best' optimizer can change depending on the amount of training data.
• So... Do We Ditch AdamW?: Norris and Linda synthesize the practical takeaways for practitioners. They conclude that while AdamW's dominance is challenged, the victory of its rivals is not as clear-cut as claimed, praising the paper for its methodological rigor.

...more

View all episodes

By Mechanical Dirk

September 19, 2025

Fantastic Pretraining Optimizers and Where to Find Them

13 minutes

...more

Sign up to save your podcasts