This episode explores a recent paper that extends neural scaling laws to predict real-world task performance rather than just training loss, while accounting for context length as a first-order variable. The episode discuss how traditional scaling laws from Kaplan (2020) and Chinchilla (2022) successfully predicted pretraining metrics but failed to address downstream task accuracy or the impact of in-context learning with varying context windows. The paper proposes a context-aware scaling law with dual power-law terms for compute and context, plus a penalty term for exceeding trained context limits, offering a simpler alternative to existing multi-stage prediction methods. Listeners interested in the mathematical foundations of LLM capabilities and the gap between training metrics and practical performance will find this discussion particularly valuable.
Sources:
1. Predicting Task Performance with Context-aware Scaling Laws — Kyle Montgomery, David Park, Jianhong Tu, Michael Bendersky, Beliz Gunel, Dawn Song, Chenguang Wang, 2025
http://arxiv.org/abs/2510.14919v1
2. Scaling Laws for Neural Language Models — Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, et al., 2020
https://scholar.google.com/scholar?q=Scaling+Laws+for+Neural+Language+Models
3. Training Compute-Optimal Large Language Models — Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al. (DeepMind), 2022
https://scholar.google.com/scholar?q=Training+Compute-Optimal+Large+Language+Models
4. Inverse Scaling: When Bigger Isn't Better — Ian McKenzie, Alexander Lyzhov, Michael Pieler, et al., 2023
https://scholar.google.com/scholar?q=Inverse+Scaling:+When+Bigger+Isn't+Better
5. Predictability and Surprise in Large Generative Models — Deep Ganguli, Danny Hernandez, Liane Lovitt, et al. (Anthropic), 2022
https://scholar.google.com/scholar?q=Predictability+and+Surprise+in+Large+Generative+Models
6. YaRN: Efficient Context Window Extension of Large Language Models — Peng et al., 2024
https://scholar.google.com/scholar?q=YaRN:+Efficient+Context+Window+Extension+of+Large+Language+Models
7. Emergent Abilities of Large Language Models — Wei et al., 2022
https://scholar.google.com/scholar?q=Emergent+Abilities+of+Large+Language+Models
8. Are Emergent Abilities of Large Language Models a Mirage? — Schaeffer et al., 2023
https://scholar.google.com/scholar?q=Are+Emergent+Abilities+of+Large+Language+Models+a+Mirage?
9. Predicting Task Performance with Context-aware Scaling Laws — the episode & the episode, 2026
https://podcast.do-not-panic.com/episodes/2026-03-16-predicting-task-performance-with-context-8abe00.mp3
10. The Art of Scaling Reinforcement Learning Compute for LLMs — the episode & the episode, 2026
https://podcast.do-not-panic.com/episodes/2026-03-16-the-art-of-scaling-reinforcement-learnin-95a3f5.mp3
Interactive Visualization: Predicting Task Performance with Context-aware Scaling Laws