LG MA MLFeb 28, 2022

Robust Multi-Agent Bandits Over Undirected Graphs

Daniel Vial, Sanjay Shakkottai, R. Srikant

arXiv:2203.00076v29.63 citations

Originality Incremental advance

AI Analysis

This addresses robust collaborative learning in adversarial multi-agent systems, with incremental improvements for specific network topologies.

The paper tackles the problem of multi-agent bandits with malicious agents on undirected graphs, showing that existing algorithms fail on non-complete graphs like the line graph, leading to near-linear regret. They propose a new algorithm achieving regret that depends locally on the number of malicious neighbors, generalizing prior bounds beyond complete graphs.

We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O( (m + K/n) \log (T) / Δ)$ regret in this setting, where $K$ is the number of arms and $Δ$ is the arm gap. For $m \ll K$, this improves over the single-agent baseline regret of $O(K\log(T)/Δ)$. In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in $K$ and $n$. In light of this negative result, we propose a new algorithm for which the $i$-th agent has regret $O( ( d_{\text{mal}}(i) + K/n) \log(T)/Δ)$ on any connected and undirected graph, where $d_{\text{mal}}(i)$ is the number of $i$'s neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

View on arXiv PDF

Similar