GTLGMAMLFeb 14, 2020

Coordination without communication: optimal regret in two players multi-armed bandits

arXiv:2002.07596v20.0025 citations
AI Analysis55

This addresses coordination challenges in multi-agent systems like robotics or networks where communication is limited, though it is incremental as it builds on existing bandit theory.

The paper tackles the problem of two cooperating agents playing the same stochastic three-armed bandit without communication, achieving near-optimal regret of O(√(T log T)) with very high probability of no collisions.

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes