Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
This addresses the challenge of efficient coordination in multi-agent learning systems, particularly in adversarial environments, with incremental improvements in regret bounds and communication efficiency.
The paper tackles the problem of distributed adversarial bandits, where multiple agents cooperate to minimize global loss with local observations, achieving a near-optimal regret bound of Ψ(√((ρ^{-1/2} + K/N)T)) and improving over prior work. It also extends the approach to distributed linear bandits, obtaining a regret bound of ł(√((ρ^{-1/2} + 1/N)dT)) with low communication cost.
We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is $\tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T})$, where $T$ is the horizon, $K$ is the number of actions, and $ρ$ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound $\tilde{O}(ρ^{-1/3}(KT)^{2/3})$ of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost $ρ^{-1/4}\sqrt{T}$ and a bandit cost $\sqrt{KT/N}$. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in $R^d$, obtaining a regret bound of $\tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT})$, achieved with only $O(d)$ communication cost per agent and per round via a volumetric spanner.