Share Deriving neural scaling laws from the statistics of natural language

Copy link

February 15, 2026

Deriving neural scaling laws from the statistics of natural language

18 minutes

This paper introduces the first theory capable of quantitatively predicting neural scaling law exponents for large language models based solely on the statistical properties of natural language. The researchers identify two primary drivers of performance: the decay of next-token conditional entropy as context length increases and the weakening of pairwise token correlations over time. By combining these metrics, they derive a first-principles formula that accurately forecasts how test loss improves with larger training datasets without requiring synthetic data or free parameters. Their theoretical predictions show a remarkable match with experimental results from GPT-2 and LLaMA-style models trained on the TinyStories and WikiText benchmarks. Ultimately, the study suggests that a model's learning efficiency is fundamentally governed by a data-dependent prediction horizon, where more data progressively unlocks the ability to utilize longer-range linguistic patterns.

...more

View all episodes

By Enoch H. Kang

February 15, 2026

Deriving neural scaling laws from the statistics of natural language

18 minutes

...more

Sign up to save your podcasts