ML LGFeb 19, 2021

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

Luofeng Liao, Zuyue Fu, Zhuoran Yang, Yixin Wang, Mladen Kolar, Zhaoran Wang

arXiv:2102.09907v321.140 citations

Originality Highly original

AI Analysis

This addresses the challenge of causal inference in offline RL for domains with unobserved confounding, offering a novel solution with theoretical guarantees.

The paper tackles the problem of learning optimal policies from observational data in offline reinforcement learning when actions are confounded by unobserved variables, by proposing an instrumental variable-aided value iteration algorithm that provably recovers transition dynamics and is efficient.

In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables is all mediated by the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction. To our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.

View on arXiv PDF

Similar