LGMESep 18, 2022

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

arXiv:2209.08666v118 citationsh-index: 48
Originality Incremental advance
AI Analysis

This addresses the problem of confounded decision-making in offline RL for domains like healthcare, though it is incremental by building on existing identification and pessimism techniques.

The paper tackles offline reinforcement learning with unmeasured confounders by using instrumental variables to identify expected rewards and proposes policy learning methods with finite-sample guarantees, demonstrating performance in a kidney transplantation study.

We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical study motivated by the kidney transplantation demonstrate the promising performance of the proposed methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes