MLLGJul 9, 2025

Off-Policy Evaluation Under Nonignorable Missing Data

arXiv:2507.06961v11 citationsh-index: 3ICML
Originality Incremental advance
AI Analysis

This addresses a practical issue in offline reinforcement learning for real-world applications where data is incomplete, though it is incremental by extending existing OPE methods to handle missing data scenarios.

The paper tackles the problem of off-policy evaluation (OPE) when logged data has missing values, showing that estimates remain unbiased under ignorable missingness but become biased under nonignorable missingness. It proposes an inverse probability weighted estimator that yields more reliable value inference, as demonstrated through numerical experiments.

Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighted value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes