Share Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

Copy link

February 14, 2026

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

15 minutes

Researchers have developed **ORBIT**, a meta-reinforcement learning framework designed to improve the **in-context online learning** capabilities of large language models. While typical models are "static after shipping," ORBIT trains them to **adapt through trial and error** across multiple episodes without updating their underlying weights. This approach allows an agent to use its **context window** as a persistent memory to gather information in early attempts and exploit that knowledge to succeed in later trials. Experiments show that meta-trained models like **Qwen3-14B** can significantly outperform standard fine-tuning and match the performance of frontier models like **GPT-5.2** on entirely new tasks. Qualitative results indicate that these agents spontaneously learn to **reflect on past failures** and purposefully explore unfamiliar environments to solve complex problems. Ultimately, the study suggests that **scaling model size** further amplifies these emergent decision-making skills, providing a pathway toward more autonomous and adaptive AI agents.

...more

View all episodes

By Enoch H. Kang

February 14, 2026

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

15 minutes

...more

Sign up to save your podcasts