Share OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

Copy link

May 11, 2026

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

22 minutes

This paper introduces Off-Policy Generative Policy Optimization (OGPO), a novel reinforcement learning algorithm designed to efficiently fine-tune generative control policies (GCPs) for complex robotic tasks. By viewing action generation as a denoising MDP nested within the environmental process, the method utilizes off-policy critics as terminal rewards to optimize the full generative process without expensive backpropagation. This approach bridges the gap between sample efficiency and expressive performance, outperforming existing techniques like residual learning or simple policy steering. Enhanced versions, such as OGPO+ and OGPO+CA, incorporate success-based regularization and conservative advantages to mitigate critic over-exploitation and performance dips during the transition from offline to online learning. Ultimately, the research demonstrates that OGPO can successfully fine-tune poorly-initialized models to near-perfect success rates in contact-rich manipulation environments, even when expert data is unavailable during the online phase.

...more

View all episodes

By Enoch H. Kang

May 11, 2026

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

22 minutes

...more

Sign up to save your podcasts