Neural intel Pod

Reinforcement Learning Under Unmeasured Confounding


Listen Later

This paper introduces a novel framework for offline reinforcement learning (RL), specifically addressing challenges in scenarios with continuous action spaces and unmeasured confounding variables. The authors develop a method for nonparametric estimation of policy value within an infinite-horizon framework by establishing a new identification result that utilizes "reward-inducing proxy variables." Based on this, they propose a minimax estimator and a policy-gradient-based algorithm to find optimal policies, providing theoretical guarantees for consistency and error bounds. The methodology's effectiveness is demonstrated through extensive simulations and a real-world application involving the German Family Panel data, aiming to identify optimal strategies for enhancing long-term relationship satisfaction.

...more
View all episodesView all episodes
Download on the App Store

Neural intel PodBy Neural Intelligence Network