Share Self-Improving Pretraining

Copy link

April 04, 2026

Self-Improving Pretraining

22 minutes

In this episode:
• Welcome to Mechanical Dreams & The Pretraining Problem: Linda introduces the Meta FAIR paper on Self-Improving Pretraining, and Professor Norris questions why standard next-token prediction is no longer sufficient.
• Breaking the Next-Token Paradigm: Linda explains the shift from next-token prediction to prefix-conditioned suffix generation, arguing that post-training safety alignment is often too late.
• Enter the Rewriter and the Judge: A deep dive into how a strong post-trained model acts as a Suffix Rewriter and a Suffix Judge to bootstrap the policy model using Reinforcement Learning.
• Empirical Triumphs: Quality, Factuality, and Safety: Discussing the massive empirical wins, including an 86 percent win rate in generation quality and major improvements in factuality and safety.
• The Data Wall and Future of Pretraining: Professor Norris is convinced by the results. The hosts discuss the broader implications for the data wall and incentive-based training.

...more

View all episodes

By Mechanical Dirk

April 04, 2026

Self-Improving Pretraining

22 minutes

...more

Sign up to save your podcasts