LGJun 5, 2017

Sparse Stochastic Bandits

arXiv:1706.01383v115 citations
Originality Highly original
AI Analysis

This work addresses the problem of improving regret guarantees in bandit algorithms for sparse reward settings, which is incremental as it builds on classical bandit theory by incorporating sparsity assumptions.

The paper tackles the sparse multi-armed bandit problem, where only a small number of arms have positive expected rewards, by developing an algorithm that achieves regret scaling with the number of positive arms (s) instead of the total arms (d), and proves its optimality with matching lower bounds and simulated evaluations.

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with d (or with sqrt(d) in the minimax sense). We here consider the sparse case of this classical problem in the sense that only a small number of arms, namely s < d, have a positive expected reward. We are able to leverage this additional assumption to provide an algorithm whose regret scales with s instead of d. Moreover, we prove that this algorithm is optimal by providing a matching lower bound - at least for a wide and pertinent range of parameters that we determine - and by evaluating its performance on simulated data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes