Mechanical Dreams

Self-Improving Pretraining


Listen Later

In this episode:
• Welcome to Mechanical Dreams & The Pretraining Problem: Linda introduces the Meta FAIR paper on Self-Improving Pretraining, and Professor Norris questions why standard next-token prediction is no longer sufficient.
• Breaking the Next-Token Paradigm: Linda explains the shift from next-token prediction to prefix-conditioned suffix generation, arguing that post-training safety alignment is often too late.
• Enter the Rewriter and the Judge: A deep dive into how a strong post-trained model acts as a Suffix Rewriter and a Suffix Judge to bootstrap the policy model using Reinforcement Learning.
• Empirical Triumphs: Quality, Factuality, and Safety: Discussing the massive empirical wins, including an 86 percent win rate in generation quality and major improvements in factuality and safety.
• The Data Wall and Future of Pretraining: Professor Norris is convinced by the results. The hosts discuss the broader implications for the data wall and incentive-based training.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk