Share DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

Copy link

November 28, 2025

DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

10 minutes

This paper introduces DELTA, a controlled benchmark of synthetic programming tasks—such as Manufactoria puzzles and BouncingSim physics simulations—specifically designed to isolate and evaluate whether reinforcement learning (RL) can teach large language models (LLMs) genuinely new reasoning procedures. The study demonstrates that RL can achieve **learnability beyond pretraining** on tasks where reference models previously failed completely, noting that naive binary reward training fails. This success is enabled by a **two-stage training strategy** that begins with dense, per-test case rewards for warm-up before switching to strict binary rewards, which triggers an abrupt **grokking transition** from exploration to mastery. Furthermore, the analysis of transferability shows that these learned skills generalize robustly across exploratory and **compose effectively** across combined skills, though performance remains poor under **transformative shifts** requiring qualitatively novel solution schemas.

...more

View all episodes

By Enoch H. Kang

November 28, 2025

DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

10 minutes

...more

Sign up to save your podcasts