ML LG STFeb 23, 2017

A minimax and asymptotically optimal algorithm for stochastic bandits

arXiv:1702.07211v212.864 citations

Originality Highly original

AI Analysis

This solves a long-standing theoretical problem in bandit optimization, merging two research lines with potential applications in online learning and decision-making.

The authors tackled the problem of regret minimization in stochastic bandit models by proposing the kl-UCB++ algorithm, which they proved is simultaneously asymptotically optimal (matching Lai and Robbins' lower bound) and minimax optimal—the first algorithm shown to achieve both properties.

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

View on arXiv PDF

Similar