MLLGSTFeb 23, 2017

A minimax and asymptotically optimal algorithm for stochastic bandits

arXiv:1702.07211v264 citations
AI Analysis

This solves a long-standing theoretical problem in bandit optimization, merging two research lines with potential applications in online learning and decision-making.

The authors tackled the problem of regret minimization in stochastic bandit models by proposing the kl-UCB++ algorithm, which they proved is simultaneously asymptotically optimal (matching Lai and Robbins' lower bound) and minimax optimal—the first algorithm shown to achieve both properties.

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes