June 28, 2025

Reinforcement Learning Under Unmeasured Confounding

1 hour 4 minutes

This paper introduces a novel framework for offline reinforcement learning (RL), specifically addressing challenges in scenarios with continuous action spaces and unmeasured confounding variables. The authors develop a method for nonparametric estimation of policy value within an infinite-horizon framework by establishing a new identification result that utilizes "reward-inducing proxy variables." Based on this, they propose a minimax estimator and a policy-gradient-based algorithm to find optimal policies, providing theoretical guarantees for consistency and error bounds. The methodology's effectiveness is demonstrated through extensive simulations and a real-world application involving the German Family Panel data, aiming to identify optimal strategies for enhancing long-term relationship satisfaction.

...more

View all episodes

By Neuralintel.org

June 28, 2025

Reinforcement Learning Under Unmeasured Confounding

1 hour 4 minutes

...more

Share Reinforcement Learning Under Unmeasured Confounding

Sign up to save your podcasts

Reinforcement Learning Under Unmeasured Confounding

Reinforcement Learning Under Unmeasured Confounding