LGMLJun 29, 2018

Bayesian Counterfactual Risk Minimization

arXiv:1806.11500v638 citations
Originality Highly original
AI Analysis

This work addresses offline learning challenges in bandit feedback settings, offering a practical improvement for machine learning applications in domains like recommendation systems.

The authors tackled the problem of offline learning from logged bandit feedback by proposing a Bayesian view of counterfactual risk minimization, which led to a new generalization bound and a novel regularization technique. Experimental results showed this technique outperforms standard L2 regularization and is competitive with variance regularization while being simpler and more efficient.

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes