ML LGJun 27, 2024

Off-policy Evaluation with Deeply-abstracted States

Meiling Hao, Pingfan Su, Liyuan Hu, Zoltan Szabo, Qingyuan Zhao, Chengchun Shi

arXiv:2406.19531v39.22 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient offline policy assessment for reinforcement learning practitioners, though it appears incremental as it adapts existing state abstraction concepts to OPE.

The paper tackles the challenge of accurate off-policy evaluation in large state spaces by applying state abstractions, originally used for policy learning, to OPE. It introduces a novel iterative procedure for deeply-abstracted states, which simplifies sample complexity and proves Fisher consistencies for OPE estimators.

Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally designed for policy learning -- in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE, and derive a backward-model-irrelevance condition for achieving irrelevance in %sequential and (marginalized) importance sampling ratios by constructing a time-reversed Markov decision process (MDP). (ii) We propose a novel iterative procedure that sequentially projects the original state space into a smaller space, resulting in a deeply-abstracted state, which substantially simplifies the sample complexity of OPE arising from high cardinality. (iii) We prove the Fisher consistencies of various OPE estimators when applied to our proposed abstract state spaces.

View on arXiv PDF Code

Similar