ML LGJun 3, 2025

Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings

Houssam Zenati, Bariscan Bozkurt, Arthur Gretton

arXiv:2506.02793v212.33 citationsh-index: 9

Originality Highly original

AI Analysis

This work addresses the need for accurate counterfactual outcome estimation in critical applications such as healthcare and advertising, offering a nonparametric and robust method that could enhance decision-making processes.

The paper tackled the problem of estimating outcome distributions under counterfactual policies for decision-making in domains like recommendation and healthcare, by proposing a Counterfactual Policy Mean Embedding framework that enables flexible distributional off-policy evaluation, with a doubly robust estimator achieving improved convergence rates and supporting efficient hypothesis testing and sampling.

Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation. We introduce both a plug-in estimator and a doubly robust estimator; the latter enjoys improved convergence rates by correcting for bias in both the outcome embedding and propensity models. Building on this, we develop a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals. Our framework also supports sampling from the counterfactual distribution. Numerical simulations illustrate the practical benefits of CPME over existing methods.

View on arXiv PDF

Similar