
Sign up to save your podcasts
Or


This paper proposes a new framework for evaluating the adaptive abilities of large language models (LLMs), which the authors term **in-context experiential learning**. To test an agent's ability to improve its performance by leveraging past interactions, the paper introduces the **Benchmark for Experiential Learning and Active Exploration (BELA)**. This benchmark simulates complex, multi-episode product recommendation scenarios, utilizing **rich real-world product data** and **scalable LLM-simulated user personas** to introduce realistic uncertainty. Agents must iteratively question the simulated customers to discover latent preferences and refine their strategies over time, departing from simple, single-interaction evaluation methods. Experimental results show that **current state-of-the-art LLMs consistently fail to demonstrate improvement** across successive episodes, highlighting a major deficiency in their capacity for experiential learning. This research emphasizes the urgent need for developing more resilient agentic systems that can effectively reason through **real-world uncertainty and dynamic feedback**.
By Enoch H. KangThis paper proposes a new framework for evaluating the adaptive abilities of large language models (LLMs), which the authors term **in-context experiential learning**. To test an agent's ability to improve its performance by leveraging past interactions, the paper introduces the **Benchmark for Experiential Learning and Active Exploration (BELA)**. This benchmark simulates complex, multi-episode product recommendation scenarios, utilizing **rich real-world product data** and **scalable LLM-simulated user personas** to introduce realistic uncertainty. Agents must iteratively question the simulated customers to discover latent preferences and refine their strategies over time, departing from simple, single-interaction evaluation methods. Experimental results show that **current state-of-the-art LLMs consistently fail to demonstrate improvement** across successive episodes, highlighting a major deficiency in their capacity for experiential learning. This research emphasizes the urgent need for developing more resilient agentic systems that can effectively reason through **real-world uncertainty and dynamic feedback**.