
Sign up to save your podcasts
Or


This document introduces CFGRL, a novel framework that bridges generative modeling, specifically diffusion guidance, and reinforcement learning. The core idea is to treat policy improvement as guiding a diffusion model, allowing for simple training akin to supervised learning while still enabling performance beyond the initial dataset. CFGRL can improve policies by combining a reference policy with an "optimality" distribution, and crucially, the degree of this improvement can be controlled during testing without retraining through a guidance weight. The paper demonstrates CFGRL's effectiveness in offline reinforcement learning and as an enhancement to goal-conditioned behavioral cloning, consistently outperforming baselines in various tasks. A key advantage highlighted is CFGRL's ability to achieve policy improvement without necessarily learning an explicit value function.
By Enoch H. KangThis document introduces CFGRL, a novel framework that bridges generative modeling, specifically diffusion guidance, and reinforcement learning. The core idea is to treat policy improvement as guiding a diffusion model, allowing for simple training akin to supervised learning while still enabling performance beyond the initial dataset. CFGRL can improve policies by combining a reference policy with an "optimality" distribution, and crucially, the degree of this improvement can be controlled during testing without retraining through a guidance weight. The paper demonstrates CFGRL's effectiveness in offline reinforcement learning and as an enhancement to goal-conditioned behavioral cloning, consistently outperforming baselines in various tasks. A key advantage highlighted is CFGRL's ability to achieve policy improvement without necessarily learning an explicit value function.