LGMAMLDec 15, 2023

Optimal Regret Bounds for Collaborative Learning in Bandits

arXiv:2312.09674v1h-index: 14ALT
Originality Incremental advance
AI Analysis

This work solves an open problem in collaborative learning for bandits, which is important for multi-agent systems in fields like robotics or distributed AI, though it appears incremental as it builds on prior sample complexity results.

The paper tackles the problem of optimal regret minimization in collaborative multi-agent multi-armed bandits, where agents communicate through a central controller, and achieves the first algorithm with order optimal regret bounds while requiring only a small constant number of expected communication rounds.

We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes