AI: post transformers

SPAM: Stabilizing LLM Training with Spike-Aware Optimization


Listen Later

This February 2025 research addresses the critical issue of training instability in Large Language Models (LLMs), which often stems from sudden, massive "gradient spikes" that can be thousands of times larger than typical gradients. The authors introduce Spike-Aware Adam with Momentum Reset (SPAM), a novel optimizer designed to counteract these spikes through periodic momentum resets and spike-aware gradient clipping, which scales down rather than zeroes out large gradients. Experiments demonstrate that SPAM consistently outperforms existing optimizers like Adam and Adafactor across various LLM sizes during both pre-training and fine-tuning. Furthermore, SPAM offers a memory-efficient version leveraging sparse momentum, enabling better performance under memory constraints compared to other state-of-the-art memory-efficient optimizers. The study highlights the detrimental impact of gradient spikes and presents an effective optimization strategy to enhance LLM training stability and resource efficiency.

Source:

https://arxiv.org/pdf/2501.06842

...more
View all episodesView all episodes
Download on the App Store

AI: post transformersBy mcgrof