STLGMLFeb 6, 2013

Bounded regret in stochastic multi-armed bandits

arXiv:1302.1611v296 citations
Originality Incremental advance
AI Analysis

This addresses a theoretical problem in online decision-making for researchers, providing incremental improvements in regret bounds under specific knowledge assumptions.

The paper tackles the stochastic multi-armed bandit problem by proposing a new randomized policy that achieves uniformly bounded regret over time when both the optimal arm value and a lower bound on the gap are known, with results including lower bounds showing bounded regret is impossible with only partial knowledge.

We study the stochastic multi-armed bandit problem when one knows the value $μ^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $Δ$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $Δ$, and bounded regret of order $1/Δ$ is not possible if one only knows $μ^{(\star)}$

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes