DSLGMLDec 25, 2017

Stochastic Multi-armed Bandits in Constant Space

arXiv:1712.09007v237 citations
Originality Highly original
AI Analysis

This work addresses the challenge of limited memory in bandit algorithms, which is crucial for applications in resource-constrained environments like embedded systems or large-scale streaming data.

The paper tackles the stochastic multi-armed bandit problem under sublinear space constraints, where storing win-loss records for all arms is infeasible, and presents an algorithm using constant space that achieves regret within a logarithmic factor of the optimal regret without such constraints.

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret \[ \sum_{i=1}^{K}\frac{1}{Δ_i}\log \frac{Δ_i}Δ\log T \] where $Δ_i$ is the gap between the best arm and arm $i$ and $Δ$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log 1/Δ)$ factor of the optimum regret possible without space constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes