Best AI papers explained

RL's Razor: Why Online RL Forgets Less


Listen Later

This paper explores why **Reinforcement Learning (RL) fine-tuning leads to less catastrophic forgetting** in models compared to **Supervised Fine-Tuning (SFT)**, even when both achieve similar performance on new tasks. The authors introduce **"RL's Razor,"** a principle stating that **RL is implicitly biased towards solutions that cause minimal change (KL divergence) from the original model's policy** when learning new tasks. Empirical and theoretical evidence supports this, demonstrating that **KL divergence on the new task is a strong predictor of forgetting**, regardless of the training algorithm. The core reason for RL's advantage is its **on-policy training**, which samples from the model's current distribution and reweights those samples, leading to more conservative and KL-minimal updates compared to SFT's reliance on fixed external annotations.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang