LG MLApr 4, 2019

Empirical Bayes Regret Minimization

Chih-Wei Hsu, Branislav Kveton, Ofer Meshi, Martin Mladenov, Csaba Szepesvari

arXiv:1904.02664v410.716 citationsh-index: 57

Originality Highly original

AI Analysis

This addresses the practical inefficiency of theoretical bandit algorithms for users in reinforcement learning and decision-making domains, representing a novel approach rather than an incremental improvement.

The paper tackled the problem of bandit algorithms being too conservative in practice by introducing algorithm design through minimizing empirical Bayes regret, and reported several-fold reductions in Bayes regret for state-of-the-art bandit algorithms by optimizing over a small sample from a distribution.

Most bandit algorithm designs are purely theoretical. Therefore, they have strong regret guarantees, but also are often too conservative in practice. In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the average regret over problem instances sampled from a known distribution. We focus on a tractable instance of this problem, the confidence interval and posterior width tuning, and propose an efficient algorithm for solving it. The tuning algorithm is analyzed and evaluated in multi-armed, linear, and generalized linear bandits. We report several-fold reductions in Bayes regret for state-of-the-art bandit algorithms, simply by optimizing over a small sample from a distribution.

View on arXiv PDF

Similar