
Sign up to save your podcasts
Or


Researchers have developed **ORBIT**, a meta-reinforcement learning framework designed to improve the **in-context online learning** capabilities of large language models. While typical models are "static after shipping," ORBIT trains them to **adapt through trial and error** across multiple episodes without updating their underlying weights. This approach allows an agent to use its **context window** as a persistent memory to gather information in early attempts and exploit that knowledge to succeed in later trials. Experiments show that meta-trained models like **Qwen3-14B** can significantly outperform standard fine-tuning and match the performance of frontier models like **GPT-5.2** on entirely new tasks. Qualitative results indicate that these agents spontaneously learn to **reflect on past failures** and purposefully explore unfamiliar environments to solve complex problems. Ultimately, the study suggests that **scaling model size** further amplifies these emergent decision-making skills, providing a pathway toward more autonomous and adaptive AI agents.
By Enoch H. KangResearchers have developed **ORBIT**, a meta-reinforcement learning framework designed to improve the **in-context online learning** capabilities of large language models. While typical models are "static after shipping," ORBIT trains them to **adapt through trial and error** across multiple episodes without updating their underlying weights. This approach allows an agent to use its **context window** as a persistent memory to gather information in early attempts and exploit that knowledge to succeed in later trials. Experiments show that meta-trained models like **Qwen3-14B** can significantly outperform standard fine-tuning and match the performance of frontier models like **GPT-5.2** on entirely new tasks. Qualitative results indicate that these agents spontaneously learn to **reflect on past failures** and purposefully explore unfamiliar environments to solve complex problems. Ultimately, the study suggests that **scaling model size** further amplifies these emergent decision-making skills, providing a pathway toward more autonomous and adaptive AI agents.