Mechanical Dreams

Fantastic Pretraining Optimizers and Where to Find Them


Listen Later

In this episode:
• The Optimizer Royal Rumble: Professor Norris and Linda introduce the chaotic landscape of LLM optimizers, where everyone claims to beat the reigning champion, AdamW. They introduce today's paper, which aims to be the referee in this messy fight.
• The Art of the Unfair Comparison: Linda explains the paper's core thesis: many new optimizers seem fast only because they are compared against poorly tuned baselines. Professor Norris agrees, highlighting the critical importance of fair hyperparameter tuning.
• Diminishing Returns and Shifting Allegiances: The hosts dive into the paper's main findings, discussing how the speedup of new optimizers shrinks with model size and how the 'best' optimizer can change depending on the amount of training data.
• So... Do We Ditch AdamW?: Norris and Linda synthesize the practical takeaways for practitioners. They conclude that while AdamW's dominance is challenged, the victory of its rivals is not as clear-cut as claimed, praising the paper for its methodological rigor.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk