Sign up to save your podcastsEmail addressPasswordRegisterOrContinue with GoogleAlready have an account? Log in here.
An automatically generated podcast about machine learning and natural language processing. The two fictional hosts talk about papers that I want to learn more about on my way to work. It's not good, b... more
FAQs about Mechanical Dreams:How many episodes does Mechanical Dreams have?The podcast currently has 142 episodes available.
January 05, 2025Optimal Linear Decay Learning Rate Schedules and Further RefinementsIn this episode:• The Death of Cosine?: Introduction to the episode and the paper. Professor Norris expresses his skepticism about changing established habits like Cosine Annealing, while Linda teases a shake-up in the status quo.• Theory vs. Reality: A discussion on the massive gap between theoretical learning rates (like 1/t) and what practitioners actually use. Linda explains why the theory has historically failed to match practice.• Linear Decay Takes the Crown: Linda explains the paper's core theoretical finding: that a simple linear decay is optimal for the last iterate of SGD, challenging the dominance of Cosine Decay.• Refining the Schedule: Deep dive into the 'Refinement' technique where past gradient norms dictate the future schedule. Discussion on how 'warm-up' naturally emerges from the mathematics rather than being a heuristic hack.• The Verdict and The Future: Reviewing the experimental results across Vision and LLMs. Final thoughts on whether practitioners should actually switch to Linear Decay or Refined schedules....more19minPlay
December 20, 2024Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers...more10minPlay
December 20, 2024Efficient and Approximate Per-Example Gradient Norms for Gradient Noise Scale...more12minPlay
December 13, 2024Rephrasing natural text data with different languages and quality levels...more12minPlay
December 12, 2024Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs...more12minPlay
December 09, 2024Model soups - averaging weights of multiple fine-tuned models improves accuracy without increasing inference time...more7minPlay
December 06, 2024Model-Aware Data Selection for Efficient Pretraining with Data Influence Models...more14minPlay
FAQs about Mechanical Dreams:How many episodes does Mechanical Dreams have?The podcast currently has 142 episodes available.