MLLGMay 29, 2016

Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

arXiv:1605.09004v1170 citations
Originality Highly original
AI Analysis

This provides fundamental theoretical limits for a core bandit problem, with implications for algorithm design and analysis in sequential decision-making.

The paper tackles the fixed budget best arm identification problem in stochastic bandits, proving a lower bound showing that any strategy must misidentify the best arm with probability at least exp(-T/(log(K)H)), which disproves the belief that error probability could be bounded by exp(-T/H). This result closes the gap between upper and lower bounds, confirming optimality of some existing strategies.

We consider the problem of \textit{best arm identification} with a \textit{fixed budget $T$}, in the $K$-armed stochastic bandit setting, with arms distribution defined on $[0,1]$. We prove that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded by $$\exp\Big(-\frac{T}{\log(K)H}\Big),$$ where $H$ is the sum for all sub-optimal arms of the inverse of the squared gaps. Our result disproves formally the general belief - coming from results in the fixed confidence setting - that there must exist an algorithm for this problem whose probability of error is upper bounded by $\exp(-T/H)$. This also proves that some existing strategies based on the Successive Rejection of the arms are optimal - closing therefore the current gap between upper and lower bounds for the fixed budget best arm identification problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes