
Sign up to save your podcasts
Or


This paper explores how the statistical properties of pretraining data determine the success of in-context learning (ICL) in transformer models. By developing a theoretical framework that unifies task selection and generalization, the authors demonstrate that heavy-tailed pretraining distributions significantly enhance a model's robustness to distribution shifts. Conversely, while light-tailed distributions excel at familiar tasks, they require fewer examples to generalize effectively. The study also highlights that stronger temporal dependencies within data sequences increase the volume of training tasks necessary for reliable performance. Through experiments on numerical tasks like stochastic differential equations, the findings suggest that careful distribution design is essential for building reliable and adaptable AI systems.
By Enoch H. KangThis paper explores how the statistical properties of pretraining data determine the success of in-context learning (ICL) in transformer models. By developing a theoretical framework that unifies task selection and generalization, the authors demonstrate that heavy-tailed pretraining distributions significantly enhance a model's robustness to distribution shifts. Conversely, while light-tailed distributions excel at familiar tasks, they require fewer examples to generalize effectively. The study also highlights that stronger temporal dependencies within data sequences increase the volume of training tasks necessary for reliable performance. Through experiments on numerical tasks like stochastic differential equations, the findings suggest that careful distribution design is essential for building reliable and adaptable AI systems.