Optimal Arm Elimination Algorithms for Combinatorial Bandits
This work addresses a key bottleneck in combinatorial bandits for applications such as online recommendation and assortment optimization, offering a novel method with proven near-optimal regret.
The paper tackles the challenge of adapting arm elimination methods to combinatorial bandits, where selecting multiple arms per round is common in applications like online recommendation. It introduces a novel elimination scheme that achieves near-optimal regret in settings with general graph feedback and linear contextual bandits, outperforming UCB-based methods which can fail due to insufficient exploration.
Combinatorial bandits extend the classical bandit framework to settings where the learner selects multiple arms in each round, motivated by applications such as online recommendation and assortment optimization. While extensions of upper confidence bound (UCB) algorithms arise naturally in this context, adapting arm elimination methods has proved more challenging. We introduce a novel elimination scheme that partitions arms into three categories (confirmed, active, and eliminated), and incorporates explicit exploration to update these sets. We demonstrate the efficacy of our algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit. In both cases, our approach achieves near-optimal regret, whereas UCB-based methods can provably fail due to insufficient explicit exploration. Matching lower bounds are also provided.