Multi-Agent Best Arm Identification in Stochastic Linear Bandits
This addresses the problem of efficient decision-making in distributed systems for applications like recommendation or resource allocation, though it is incremental as it builds on existing bandit and network frameworks.
The paper tackles collaborative best-arm identification in stochastic linear bandits across network topologies, proposing algorithms that achieve exponentially decaying error probability and outperform state-of-the-art multi-agent methods in experiments.
We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we first consider multiple agents connected through a star network, interacting with a linear bandit instance in parallel. We then extend our analysis to arbitrary network topologies. The objective of the agents is to collaboratively identify the best arm of the given bandit instance with the help of a central server while minimizing the probability of error in best arm estimation. To this end, we propose two algorithms, MaLinBAI-Star and MaLinBAI-Gen for star networks and networks with arbitrary structure, respectively. Both algorithms utilize the technique of G-optimal design along with the successive elimination based strategy where agents share their knowledge through a central server at each communication round. We demonstrate, both theoretically and empirically, that our algorithms achieve exponentially decaying probability of error in the allocated time budget. Furthermore, experimental results on both synthetic and real-world data validate the effectiveness of our algorithms over the state-of-the art existing multi-agent algorithms.