Multiplayer bandits without observing collision information
This addresses coordination challenges in decentralized multi-agent systems, such as wireless networks or robotics, offering incremental improvements in regret bounds for collision handling.
The paper tackles multiplayer bandit problems where players cannot communicate and collisions yield zero reward, focusing on two feedback models: with and without collision observation. It provides the first theoretical guarantees for the no-collision-info model, including logarithmic and gap-independent square-root regret algorithms, and extends these to approximate Nash equilibria in anti-coordination games.
We study multiplayer stochastic multi-armed bandit problems in which the players cannot communicate and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup when no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret, and an algorithm with a square-root regret type that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anti-coordination games.