LGFeb 11, 2025

A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

arXiv:2502.07514v1h-index: 3

Originality Highly original

AI Analysis

This work addresses the challenge of robust decision-making in bandit problems under corruption, with incremental improvements in regret bounds and scalability for applications in multi-agent systems.

The paper tackles the problem of stochastic bandits with adversarial corruption by proposing BARBAT, a framework that improves upon BARBAR to achieve an optimal regret bound of O(C log T) up to logarithmic factors, eliminating a factor of K. It demonstrates extensions to various settings like graph and multi-agent bandits, offering efficiency and scalability advantages over FTRL-based methods.

We investigate various stochastic bandit problems in the presence of adversarial corruption. A seminal contribution to this area is the BARBAR~\citep{gupta2019better} algorithm, which is both simple and efficient, tolerating significant levels of corruption with nearly no degradation in performance. However, its regret upper bound exhibits a complexity of $O(KC)$, while the lower bound is $Ω(C)$. In this paper, we enhance the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ and achieves an optimal regret bound up to a logarithmic factor. We also demonstrate how BARBAT can be extended to various settings, including graph bandits, combinatorial semi-bandits, batched bandits and multi-agent bandits. In comparison to the Follow-The-Regularized-Leader (FTRL) family of methods, which provide a best-of-both-worlds guarantee, our approach is more efficient and parallelizable. Notably, FTRL-based methods face challenges in scaling to batched and multi-agent settings.

View on arXiv PDF

Similar