Share How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness

Copy link

January 23, 2026

How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness

18 minutes

This paper explores how the statistical properties of pretraining data determine the success of in-context learning (ICL) in transformer models. By developing a theoretical framework that unifies task selection and generalization, the authors demonstrate that heavy-tailed pretraining distributions significantly enhance a model's robustness to distribution shifts. Conversely, while light-tailed distributions excel at familiar tasks, they require fewer examples to generalize effectively. The study also highlights that stronger temporal dependencies within data sequences increase the volume of training tasks necessary for reliable performance. Through experiments on numerical tasks like stochastic differential equations, the findings suggest that careful distribution design is essential for building reliable and adaptable AI systems.

...more

View all episodes

By Enoch H. Kang

January 23, 2026

How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness

18 minutes

...more

Sign up to save your podcasts