AILGDec 3, 2020

Distributed Thompson Sampling

arXiv:2012.01789v2
AI Analysis

This work is significant for researchers and practitioners working on distributed learning and multi-agent systems, providing an incremental improvement to existing Thompson Sampling approaches.

This paper addresses the problem of minimizing cumulative regret in a cooperative multi-agent multi-armed bandit setting with M agents and K arms. The authors propose a distributed Elimination based Thompson Sampling algorithm that allows agents to learn collaboratively, demonstrating a problem-dependent upper bound on cumulative regret under Bernoulli rewards.

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes