Share Reward is enough: LLMs are in-context reinforcement learners

Copy link

January 19, 2026

Reward is enough: LLMs are in-context reinforcement learners

10 minutes

Researchers have introduced In-Context Reinforcement Learning (ICRL), a novel prompting framework that enables large language models to self-improve during inference using only numerical scalar rewards. Unlike traditional methods that rely on verbal feedback or costly retraining, ICRL treats the model’s context window as a dynamic experience buffer, concatenating past attempts with their corresponding reward signals. As this context grows, the model demonstrates an emergent ability to optimize its responses by learning from both successful and failed iterations in real time. Evaluations across diverse domains—including Olympiad-level mathematics, creative writing, and scientific simulations—show that this approach significantly outperforms established baselines like Self-Refine and Reflexion. The study concludes that reinforcement learning is an intrinsic capability of pretrained models that can be elicited through minimal, reward-based instructions. Ultimately, ICRL provides a promising paradigm for test-time scaling, allowing agents to adapt to novel, complex tasks without updating their underlying parameters.

...more

View all episodes

By Enoch H. Kang

January 19, 2026

Reward is enough: LLMs are in-context reinforcement learners

10 minutes

...more

Sign up to save your podcasts