LG MLJun 20, 2020

An Optimal Elimination Algorithm for Learning a Best Arm

Avinatan Hassidim, Ron Kupfer, Yaron Singer

arXiv:2006.11647v17.215 citations

Originality Highly original

AI Analysis

This addresses a fundamental gap in understanding worst-case sample complexity for statisticians and learning theorists, representing a strong specific gain rather than incremental progress.

The paper tackles the problem of (ε,δ)-PAC learning a best arm in multi-armed bandits, achieving an algorithm with sample complexity that converges exactly to the optimal sample complexity for learning the means of n arms separately, supported by a conditional matching lower bound.

We consider the classic problem of $(ε,δ)$-PAC learning a best arm where the goal is to identify with confidence $1-δ$ an arm whose mean is an $ε$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst-case sample complexity is not well understood. In this paper, we propose a new approach for $(ε,δ)$-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(ε,δ)$-learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound. More specifically:

View on arXiv PDF

Similar