ML AI LGMay 10, 2021

Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

arXiv:2105.04646v220.143 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses uncertainty quantification in off-policy evaluation, which is important for applications like reinforcement learning and causal inference, though it appears incremental as it builds on existing debiasing methods.

The paper tackles the problem of constructing confidence intervals for off-policy evaluation, proposing a deeply-debiasing procedure that yields efficient, robust, and flexible intervals, with results validated through theoretical analysis and numerical experiments.

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

View on arXiv PDF Code

Similar