AI Post Transformers

Keel: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep


Listen Later

The January 27, 2026 ByteDance paper "Post-LayerNorm Is Back: Stable, ExpressivE, and Deep" introduces th Keel architecture which addresses the optimization instability of deep Transformers by reviving the Post-LayerNorm formulation, which theoretically offers better expressivity than the standard Pre-LayerNorm but historically fails to train at scale due to gradient vanishing. By replacing the standard ResNet-style residual pathway with a Highway-style connection and injecting an additional normalization step into the residual branch, Keel preserves gradient magnitude and enables stable training at depths exceeding 1,000 layers without requiring complex initialization. Empirical evaluations show that Keel tolerates significantly higher learning rates and outperforms Pre-LayerNorm baselines in reasoning and coding tasks, proving that depth scaling remains a viable path for improving model performance. The Keel paper argues that extending context length is limited to expanding information access rather than improving "fundamental expressivity," asserting that while longer contexts allow models to process more data, they do not inherently unlock the capacity for complex hierarchical reasoning. Whereas Gemini 1.5 demonstrates that long context drives In-Context Learning (ICL) and reduces predictive uncertainty (NLL) via retrieval-heavy tasks like learning a language from a manual, Keel contends that this does not equate to the "qualitatively new behaviors" unlocked by depth scaling. Consequently, Keel frames depth as the superior, albeit historically unstable, axis for improving model reasoning (particularly in math and code), contrasting the "diminishing returns" and high cost of context scaling against the robust expressivity gains achieved by stabilizing deeper networks.Sources:Keel:https://arxiv.org/pdf/2601.19895The Gemini 1.5 paper with its power law:https://arxiv.org/pdf/2403.05530
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof