Share Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

Copy link

January 31, 2026

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

19 minutes

This paper provides a formal theoretical framework for success conditioning, a widely used reinforcement learning heuristic employed in Decision Transformers and language model alignment. The author proves that this technique is not merely a heuristic but exactly solves a trust-region optimization problem using a unique chi-squared divergence constraint. A central contribution is the Action-Influence Identity, which demonstrates that the magnitude of policy improvement is equal to the statistical variability in success rates attributable to the behavior policy's actions. This identity reveals that success conditioning is inherently conservative: it avoids dangerous distribution shifts by design and fails only when it becomes overly cautious in the absence of sufficient signal. Furthermore, the research explains how return thresholding acts as a proxy that can amplify these improvements, provided the chosen success criteria remain aligned with the true objective. Ultimately, the work bridges the gap between simple supervised fine-tuning on successful outcomes and the rigorous mathematical foundations of policy optimization.

...more

View all episodes

By Enoch H. Kang

January 31, 2026

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

19 minutes

...more

Sign up to save your podcasts