LG MLJul 22, 2019

Doubly robust off-policy evaluation with shrinkage

Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík

arXiv:1907.09623v223.9130 citations

Originality Incremental advance

AI Analysis

This work addresses off-policy evaluation for contextual bandits, offering improved estimators that are adaptive and effective in both standard and combinatorial settings, though it is incremental as it builds on existing doubly robust methods.

The paper tackled the problem of off-policy evaluation in contextual bandits by proposing a framework that shrinks importance weights to optimize bias-variance tradeoff, resulting in estimators that outperform state-of-the-art methods in experiments.

We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.

View on arXiv PDF

Similar