LGDCJan 23, 2022

Distributed Bandits with Heterogeneous Agents

arXiv:2201.09353v224 citations
AI Analysis

This addresses the challenge of efficient cooperation in distributed learning for heterogeneous agents, though it is incremental as it builds on existing bandit frameworks.

The paper tackles a multi-agent bandit problem with heterogeneous agents having limited local arm access and asynchronous decision-making, proposing two algorithms that achieve order-optimal regret and low communication complexity, with numerical experiments verifying their efficiency.

This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed stochastic bandit problem. The agents are \textit{heterogeneous}: each agent has limited access to a local subset of arms and the agents are asynchronous with different gaps between decision-making rounds. The goal for each agent is to find its optimal local arm, and agents can cooperate by sharing their observations with others. While cooperation between agents improves the performance of learning, it comes with an additional complexity of communication between agents. For this heterogeneous multi-agent setting, we propose two learning algorithms, \ucbo and \AAE. We prove that both algorithms achieve order-optimal regret, which is $O\left(\sum_{i:\tildeΔ_i>0} \log T/\tildeΔ_i\right)$, where $\tildeΔ_i$ is the minimum suboptimality gap between the reward mean of arm $i$ and any local optimal arm. In addition, a careful selection of the valuable information for cooperation, \AAE achieves a low communication complexity of $O(\log T)$. Last, numerical experiments verify the efficiency of both algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes