LG MLSep 28, 2019

An Optimal Algorithm for Multiplayer Multi-Armed Bandits

arXiv:1909.13079v26.08 citations

Originality Highly original

AI Analysis

This addresses collaboration challenges in decentralized multi-agent systems, offering a more efficient solution with reduced communication overhead.

The paper tackles the Multiplayer Multi-Armed Bandit problem by proposing DPE, a decentralized algorithm that matches the regret of an optimal centralized algorithm and improves upon the state-of-the-art SIC-MMAB, with finite expected communication rounds.

The paper addresses the Multiplayer Multi-Armed Bandit (MMAB) problem, where $M$ decision makers or players collaborate to maximize their cumulative reward. When several players select the same arm, a collision occurs and no reward is collected on this arm. Players involved in a collision are informed about this collision. We present DPE (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same regret as that obtained by an optimal centralized algorithm. Our algorithm has better regret guarantees than the state-of-the-art algorithm SIC-MMAB \cite{boursier2019}. As in SIC-MMAB, players communicate through collisions only. An additional important advantage of DPE is that it requires very little communication. Specifically, the expected number of rounds where players use collisions to communicate is finite.

View on arXiv PDF

Similar