Efficient Swap Regret Minimization in Combinatorial Bandits
This resolves a long-standing challenge in combinatorial bandits, enabling efficient regret minimization for applications with large action spaces.
The paper tackles the problem of achieving efficient no-swap regret minimization in combinatorial bandits with exponentially large action sets, introducing an algorithm that achieves sublinear swap regret with polylogarithmic dependence on the number of actions and tight bounds for this class.
This paper addresses the problem of designing efficient no-swap regret algorithms for combinatorial bandits, where the number of actions $N$ is exponentially large in the dimensionality of the problem. In this setting, designing efficient no-swap regret translates to sublinear -- in horizon $T$ -- swap regret with polylogarithmic dependence on $N$. In contrast to the weaker notion of external regret minimization - a problem which is fairly well understood in the literature - achieving no-swap regret with a polylogarithmic dependence on $N$ has remained elusive in combinatorial bandits. Our paper resolves this challenge, by introducing a no-swap-regret learning algorithm with regret that scales polylogarithmically in $N$ and is tight for the class of combinatorial bandits. To ground our results, we also demonstrate how to implement the proposed algorithm efficiently -- that is, with a per-iteration complexity that also scales polylogarithmically in $N$ -- across a wide range of well-studied applications.