LGMAMLAug 14, 2020

Cooperative Multi-Agent Bandits with Heavy Tails

arXiv:2008.06244v155 citations
Originality Highly original
AI Analysis

This work addresses the problem of robust decision-making in distributed systems with heavy-tailed noise, providing a novel solution for multi-agent reinforcement learning applications.

The paper tackles the cooperative multi-agent bandit problem with heavy-tailed rewards by proposing MP-UCB, a decentralized algorithm that incorporates robust estimation and message-passing, achieving optimal regret bounds and demonstrating superiority over existing methods.

We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as~\textit{running consensus}, that does not lend itself to robust estimation for heavy-tailed settings. We propose \textsc{MP-UCB}, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol. We prove optimal regret bounds for \textsc{MP-UCB} for several problem settings, and also demonstrate its superiority to existing methods. Furthermore, we establish the first lower bounds for the cooperative bandit problem, in addition to providing efficient algorithms for robust bandit estimation of location.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes