
Sign up to save your podcasts
Or


This paper introduces how we can adapt quickly to new tasks without updating model parameters using a framework called SPICE (Shaping Policies In-Context with Ensemble prior), a novel Bayesian In-Context Reinforcement Learning method.Unlike existing models that rely on optimal data, SPICE utilizes a deep ensemble to learn a value prior from suboptimal trajectories and refines this prior at test-time through Bayesian updates. This approach effectively addresses the behavior-policy bias found in traditional supervised learning by using an Upper-Confidence Bound (UCB) rule to encourage principled exploration. Theoretical analysis proves that SPICE achieves optimal regret bounds in both stochastic bandits and finite-horizon environments. Empirical results across various benchmarks confirm that the method is robust under distribution shifts and significantly outperforms prior meta-reinforcement learning approaches. Ultimately, the research offers a scalable framework for deploying reinforcement learning in real-world domains like robotics and autonomous driving where data may be limited or biased.
By Enoch H. KangThis paper introduces how we can adapt quickly to new tasks without updating model parameters using a framework called SPICE (Shaping Policies In-Context with Ensemble prior), a novel Bayesian In-Context Reinforcement Learning method.Unlike existing models that rely on optimal data, SPICE utilizes a deep ensemble to learn a value prior from suboptimal trajectories and refines this prior at test-time through Bayesian updates. This approach effectively addresses the behavior-policy bias found in traditional supervised learning by using an Upper-Confidence Bound (UCB) rule to encourage principled exploration. Theoretical analysis proves that SPICE achieves optimal regret bounds in both stochastic bandits and finite-horizon environments. Empirical results across various benchmarks confirm that the method is robust under distribution shifts and significantly outperforms prior meta-reinforcement learning approaches. Ultimately, the research offers a scalable framework for deploying reinforcement learning in real-world domains like robotics and autonomous driving where data may be limited or biased.