Adaptive Sample Sharing for Multi Agent Linear Bandits
This work addresses the problem of regret minimization for multi-agent systems, offering a novel approach without structural assumptions, though it appears incremental as it builds on existing bandit collaboration frameworks.
The paper tackles the challenge of efficient collaboration in multi-agent linear bandits by studying data sharing's impact on regret minimization, resulting in the BASS algorithm that outperforms state-of-the-art methods in theoretical and empirical evaluations.
The multi-agent linear bandit setting is a well-known setting for which designing efficient collaboration between agents remains challenging. This paper studies the impact of data sharing among agents on regret minimization. Unlike most existing approaches, our contribution does not rely on any assumptions on the bandit parameters structure. Our main result formalizes the trade-off between the bias and uncertainty of the bandit parameter estimation for efficient collaboration. This result is the cornerstone of the Bandit Adaptive Sample Sharing (BASS) algorithm, whose efficiency over the current state-of-the-art is validated through both theoretical analysis and empirical evaluations on both synthetic and real-world datasets. Furthermore, we demonstrate that, when agents' parameters display a cluster structure, our algorithm accurately recovers them.