Share Characterization and Mitigation of Training Instabilities in Microscaling Formats

Copy link

October 08, 2025

Characterization and Mitigation of Training Instabilities in Microscaling Formats

13 minutes

In this episode:
• The Need for Speed: Microscaling Formats: Linda introduces new low-precision MX formats for training LLMs, designed to save massive amounts of compute. Professor Norris is intrigued but skeptical about the practical trade-offs.
• When Good Training Goes Bad: The hosts discuss the core problem identified in the paper: severe training instabilities and sudden, unrecoverable loss spikes when using MX formats, especially at scale.
• It's the Layernorm, Stupid!: Linda explains how the researchers used a proxy model to diagnose the instabilities, tracing the root cause to a systematic gradient bias from quantizing layernorm parameters.
• The Hybrid Solution: Professor Norris and Linda discuss the paper's proposed mitigations, focusing on a clever hybrid-precision approach that uses low-precision for weights and high-precision for activations.
• Precision on a Budget: The episode concludes by showing how these mitigation strategies successfully stabilize training, allowing for performance competitive with full-precision while still saving compute.

...more

View all episodes

By Mechanical Dirk

October 08, 2025

Characterization and Mitigation of Training Instabilities in Microscaling Formats

13 minutes

...more

Sign up to save your podcasts