March 13, 2026

Training Is Nothing Like Learning with Naomi Saphra (Harvard)

1 hour 11 minutes

Naomi Saphra, Kempner Research Fellow at Harvard and incoming Assistant Professor at Boston University, joins us to explain why you can't do interpretability without understanding training dynamics, in the same way you can't do biology without evolution.

Naomi argues that many structures researchers find inside trained models are vestigial, they mattered early in training but are meaningless by the end. Grokking is one case of a broader phenomenon: models go through multiple consecutive phase transitions during training, driven by symmetry breaking and head specialization, but the smooth loss curve hides all of it. We talk about why training is nothing like human learning, and why our intuitions about what's hard for models are consistently wrong - code in pretraining helps language reasoning, tokenization drives behaviors people attribute to deeper cognition, and language already encodes everything humans care about. We also get into why SAEs are basically topic models, the Platonic representation hypothesis, using AI to decode animal communication, and why non-determinism across training runs is a real problem that RL and MoE might be making worse.

Timeline:

(00:12) Introduction and guest welcome

(01:01) Why training dynamics matter - the evolutionary biology analogy

(03:05) Jennifer Aniston neurons and the danger of biological parallels

(04:48) What is grokking and why it's one instance of a broader phenomenon

(08:25) Phase transitions, symmetry breaking, and head specialization

(11:53) Double descent, overfitting, and the death of classical train-test splits

(15:10) Training is nothing like learning

(16:08) Scaling axes - data, model size, compute, and why they're not interchangeable

(19:29) Data quality, code as reasoning fuel, and GPT-2's real contribution

(20:43) Multilingual models and the interlingua hypothesis

(25:58) The Platonic representation hypothesis and why image classification was always multimodal

(29:12) Sparse autoencoders, interpretability, and Marr's levels

(37:32) Can we ever truly understand what models know?

(43:59) The language modality chauvinist argument

(51:55) Vision, redundancy, and self-supervised learning

(57:18) World models - measurable capabilities over philosophical definitions

(1:00:14) Is coding really a solved task?

(1:04:18) Non-determinism, scaling laws, and why one training run isn't enough

(1:10:12) Naomi's new lab at BU and recruiting

Music:

"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
"Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
Changes: trimmed

About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

...more

View all episodes

By Ravid Shwartz-Ziv & Allen Roush

44 ratings