LGMLJul 22, 2019

Doubly robust off-policy evaluation with shrinkage

arXiv:1907.09623v2130 citations
AI Analysis

This work addresses off-policy evaluation for contextual bandits, offering improved estimators that are adaptive and effective in both standard and combinatorial settings, though it is incremental as it builds on existing doubly robust methods.

The paper tackled the problem of off-policy evaluation in contextual bandits by proposing a framework that shrinks importance weights to optimize bias-variance tradeoff, resulting in estimators that outperform state-of-the-art methods in experiments.

We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes