Share Inheritune: Efficient LLM Training via Attention Collapse

Copy link

October 22, 2025

Inheritune: Efficient LLM Training via Attention Collapse

14 minutes

This June 8, 2025 collaboration between University of Texas and NYU paper describes a newly identified structural inefficiency in Large Language Models (LLMs) where the self-attention mechanism in many deeper transformer layers **collapses to a near rank-one structure**, which the authors term "lazy layers" that are redundant and inefficient. To address this, the authors propose a novel training method called **Inheritune**, which develops smaller, higher-performing models by **inheriting potent early layers** from a larger pre-trained model and then progressively expanding and retraining the compact architecture. Empirical evidence, primarily using GPT-2 models of various sizes, demonstrates that models trained with Inheritune **achieve performance comparable to or better than their larger counterparts** while using significantly fewer layers, effectively enabling model compression. The analysis further suggests that **lazy layers contain minimal transferable knowledge**, justifying their removal or progressive retraining to create more efficient LLMs.

Source:

https://arxiv.org/pdf/2404.08634

...more

View all episodes

By mcgrof