LGJun 5, 2017

Sparse Stochastic Bandits

Joon Kwon, Vianney Perchet, Claire Vernade

arXiv:1706.01383v18.415 citations

Originality Highly original

AI Analysis

This work addresses the problem of improving regret guarantees in bandit algorithms for sparse reward settings, which is incremental as it builds on classical bandit theory by incorporating sparsity assumptions.

The paper tackles the sparse multi-armed bandit problem, where only a small number of arms have positive expected rewards, by developing an algorithm that achieves regret scaling with the number of positive arms (s) instead of the total arms (d), and proves its optimality with matching lower bounds and simulated evaluations.

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with d (or with sqrt(d) in the minimax sense). We here consider the sparse case of this classical problem in the sense that only a small number of arms, namely s < d, have a positive expected reward. We are able to leverage this additional assumption to provide an algorithm whose regret scales with s instead of d. Moreover, we prove that this algorithm is optimal by providing a matching lower bound - at least for a wide and pertinent range of parameters that we determine - and by evaluating its performance on simulated data.

View on arXiv PDF

Similar