June 17, 2025

Extrapolation by Association: Length Generalization Transfer in Transformers

12 minutes

This academic paper explores length generalization transfer in Transformer language models, investigating their ability to extrapolate knowledge from shorter inputs to longer, unseen ones. The authors demonstrate that training a model on a related "auxiliary task" with longer inputs can significantly improve the generalization of a "main task" trained only on shorter examples, across diverse domains like arithmetic, string manipulation, and maze navigation. This transfer effect is also observed in pretrained language models, suggesting they develop reusable computational frameworks. Furthermore, the research provides mechanistic evidence that this transfer correlates with the shared use of attention heads between related tasks, indicating a compositional reuse of inductive structure.

keepSave to notecopy_alldocsAdd noteaudio_magic_eraserAudio OverviewflowchartMind Map

...more

View all episodes

By Enoch H. Kang

June 17, 2025

Extrapolation by Association: Length Generalization Transfer in Transformers

12 minutes

keepSave to notecopy_alldocsAdd noteaudio_magic_eraserAudio OverviewflowchartMind Map

...more

Share Extrapolation by Association: Length Generalization Transfer in Transformers

Sign up to save your podcasts

Extrapolation by Association: Length Generalization Transfer in Transformers

Extrapolation by Association: Length Generalization Transfer in Transformers