Mechanical Dreams

Latent State Models of Training Dynamics


Listen Later

In this episode:
• Why Does Seed 42 Work Best?: Linda introduces a paper that tries to answer a classic machine learning question: why does the random seed have such a big impact on training? Professor Norris laments that this is a problem as old as neural networks themselves.
• A Roadmap for Training: Linda explains the paper's novel approach of using a Hidden Markov Model to turn messy training dynamics into a clean 'training map' of latent states. Professor Norris expresses his surprise and curiosity at seeing a classic model like an HMM used to analyze modern deep learning.
• Taking the Scenic Route to Convergence: The hosts discuss the paper's key findings on 'grokking' tasks, where different random seeds lead to different paths on the training map. Linda explains the concept of 'detour states,' which are optional, slower paths to convergence that some models get stuck in.
• You Are the Traffic Controller: Professor Norris highlights the paper's powerful conclusion that training variability isn't inherent to a task, but a result of the training setup. Linda explains how removing components like batch normalization can create detours in stable tasks, while adding them can remove detours from unstable ones.
• Maps, Not Just Metrics: Linda and Professor Norris conclude by discussing the practical implications, such as a new way to analyze and compare hyperparameter settings by looking at the structure of their training maps.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk