September 29, 2025

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?

16 minutes

This research introduces DELTA-Code, a benchmark designed to investigate whether Large Language Models (LLMs) can genuinely acquire and generalize novel reasoning strategies beyond their pre-trained or post-trained capabilities using Reinforcement Learning (RL). The paper focuses on two main aspects: learnability, determining if RL can help LLMs solve coding problems that were previously unsolvable, and transferrability, assessing if those newly acquired skills can systematically generalize to out-of-distribution test sets. The authors report observing a "striking grokking phase transition" where RL-trained models suddenly achieve high accuracy after an extended period of near-zero success, using specific training ingredients like curriculum training and experience replay to enable this learning.

...more

View all episodes

By Enoch H. Kang

September 29, 2025

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?

16 minutes

...more

Share DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?

Sign up to save your podcasts

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?