ML LG PRNov 16, 2020

Distributed Bandits: Probabilistic Communication on $d$-regular Graphs

arXiv:2011.07720v25.84 citations

Originality Incremental advance

AI Analysis

This work addresses communication failures in distributed bandit systems, offering incremental improvements for multi-agent learning in networked environments.

The paper tackles the decentralized multi-agent multi-armed bandit problem with probabilistic communication over d-regular graphs, proposing a new UCB-based algorithm that theoretically outperforms state-of-the-art methods in minimizing group regret, as validated by simulations.

We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.

View on arXiv PDF

Similar