LGDCSIMLJul 7, 2020

Robust Multi-Agent Multi-Armed Bandits

arXiv:2007.03812v339 citations
Originality Highly original
AI Analysis

This addresses robustness in collaborative learning for applications like distributed computing and social recommendation systems, offering a novel solution to a known bottleneck.

The paper tackles the problem of multi-agent multi-armed bandits where some agents are malicious, showing that existing algorithms fail to improve regret with even one malicious agent, and proposes a robust algorithm that dynamically blocks malicious agents to achieve decreased regret under mild assumptions.

Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include $n$ honest and $m$ malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes