Share The Pitfalls of Next-Token Prediction

Copy link

September 11, 2025

The Pitfalls of Next-Token Prediction

10 minutes

In this episode:
• Introduction: More Than an Improv Artist?
• The Two Faces of Prediction: Linda breaks down the key difference between teacher-forced training and autoregressive inference. Professor Norris likens this to learning to drive with an instructor who constantly corrects you, which doesn't prepare you for driving alone.
• The Clever Hans Cheat: Linda explains the paper's core concept: 'teacher-forcing' can lead to a 'Clever Hans cheat,' where the model learns shortcuts from the training data instead of the actual task. This results in the first, most crucial token of a plan becoming 'indecipherable.'
• Breaking the Cheat Code: The hosts discuss the paper's proposed solutions, like 'teacherless training,' which force the model to look ahead and plan instead of relying on shortcuts. Professor Norris notes this is like removing the training wheels, forcing the model to learn the hard way.
• Conclusion: Recalibrating the Paradigm: Linda and Professor Norris conclude that the paper makes a strong, empirically-backed case that the teacher-forcing training objective is a core limitation. They agree it's a pivotal step in moving the field towards models that can genuinely plan.

...more

View all episodes

By Mechanical Dirk

September 11, 2025

The Pitfalls of Next-Token Prediction

10 minutes

...more

Sign up to save your podcasts