Coordination without communication: optimal regret in two players multi-armed bandits
This addresses coordination challenges in multi-agent systems like robotics or networks where communication is limited, though it is incremental as it builds on existing bandit theory.
The paper tackles the problem of two cooperating agents playing the same stochastic three-armed bandit without communication, achieving near-optimal regret of O(√(T log T)) with very high probability of no collisions.
We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.