MLAILGMay 10, 2021

Deeply-Debiased Off-Policy Interval Estimation

arXiv:2105.04646v243 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses uncertainty quantification in off-policy evaluation, which is important for applications like reinforcement learning and causal inference, though it appears incremental as it builds on existing debiasing methods.

The paper tackles the problem of constructing confidence intervals for off-policy evaluation, proposing a deeply-debiasing procedure that yields efficient, robust, and flexible intervals, with results validated through theoretical analysis and numerical experiments.

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes