LG MA MLApr 28, 2019

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Sébastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke

arXiv:1904.12233v213.747 citations

Originality Highly original

AI Analysis

This addresses a fundamental problem in decentralized learning for multiple agents, providing the first such guarantees in a non-stochastic setting.

The paper tackles the non-stochastic multi-player multi-armed bandit problem with no communication and maximal loss on collisions, proving a √T-type regret guarantee when collisions are announced and a sublinear T^(1-1/(2m)) guarantee without collision information.

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

View on arXiv PDF

Similar