Share Diffusion Guidance Is a Controllable Policy Improvement Operator

Copy link

June 02, 2025

Diffusion Guidance Is a Controllable Policy Improvement Operator

17 minutes

This document introduces CFGRL, a novel framework that bridges generative modeling, specifically diffusion guidance, and reinforcement learning. The core idea is to treat policy improvement as guiding a diffusion model, allowing for simple training akin to supervised learning while still enabling performance beyond the initial dataset. CFGRL can improve policies by combining a reference policy with an "optimality" distribution, and crucially, the degree of this improvement can be controlled during testing without retraining through a guidance weight. The paper demonstrates CFGRL's effectiveness in offline reinforcement learning and as an enhancement to goal-conditioned behavioral cloning, consistently outperforming baselines in various tasks. A key advantage highlighted is CFGRL's ability to achieve policy improvement without necessarily learning an explicit value function.

...more

View all episodes

By Enoch H. Kang

June 02, 2025

Diffusion Guidance Is a Controllable Policy Improvement Operator

17 minutes

...more

Sign up to save your podcasts