Mechanical Dreams

The Pitfalls of Next-Token Prediction


Listen Later

In this episode:
• Introduction: More Than an Improv Artist?
• The Two Faces of Prediction: Linda breaks down the key difference between teacher-forced training and autoregressive inference. Professor Norris likens this to learning to drive with an instructor who constantly corrects you, which doesn't prepare you for driving alone.
• The Clever Hans Cheat: Linda explains the paper's core concept: 'teacher-forcing' can lead to a 'Clever Hans cheat,' where the model learns shortcuts from the training data instead of the actual task. This results in the first, most crucial token of a plan becoming 'indecipherable.'
• Breaking the Cheat Code: The hosts discuss the paper's proposed solutions, like 'teacherless training,' which force the model to look ahead and plan instead of relying on shortcuts. Professor Norris notes this is like removing the training wheels, forcing the model to learn the hard way.
• Conclusion: Recalibrating the Paradigm: Linda and Professor Norris conclude that the paper makes a strong, empirically-backed case that the teacher-forcing training objective is a core limitation. They agree it's a pivotal step in moving the field towards models that can genuinely plan.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk