LGMLJun 19, 2021

Variance-Dependent Best Arm Identification

arXiv:2106.10417v311 citations
Originality Highly original
AI Analysis

This work addresses the variance-dependent best arm identification problem for stochastic bandits, offering a significant improvement over prior variance-independent algorithms in favorable scenarios.

The paper tackles the problem of identifying the best arm in a stochastic multi-armed bandit by proposing an adaptive algorithm that uses variance-dependent sampling, achieving a sample complexity of O(∑(σ_i²/Δ_i² + 1/Δ_i)(ln δ⁻¹ + ln ln Δ_i⁻¹)) and removing an extra ln n factor compared to state-of-the-art methods, with a matching lower bound proving optimality up to doubly logarithmic terms.

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean $θ_i$ and variance $σ_i^2$. Assume $θ_1 > θ_2 \geq \cdots \geqθ_n$. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called \textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability $(1-δ)$ and uses at most $O \left(\sum_{i = 1}^n \left(\frac{σ_i^2}{Δ_i^2} + \frac{1}{Δ_i}\right)(\ln δ^{-1} + \ln \ln Δ_i^{-1})\right)$ samples, where $Δ_i$ ($i \geq 2$) denotes the reward gap between arm $i$ and the best arm and we define $Δ_1 = Δ_2$. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra $\ln n$ factor on the best arm compared with the state-of-the-art. We further show that $Ω\left( \sum_{i = 1}^n \left( \frac{σ_i^2}{Δ_i^2} + \frac{1}{Δ_i} \right) \ln δ^{-1} \right)$ samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes