MLLGAug 26, 2025

Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits

arXiv:2508.18768v11 citationsh-index: 2
Originality Highly original
AI Analysis

This work addresses the problem of efficient contextual combinatorial semi-bandits for large-scale, real-time applications, providing an incremental improvement over existing methods.

The authors tackled the problem of contextual combinatorial semi-bandits, achieving regret bounds of $widetilde{mathcal{O}}(sqrt{T})$ in the adversarial regime and $widetilde{mathcal{O}}(ln T)$ in the corrupted stochastic regime. Their approach also led to substantial per-round speed-ups, making it suitable for large-scale applications.

We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $\widetilde{\mathcal{O}}(\sqrt{T})$ regret in the adversarial regime and $\widetilde{\mathcal{O}}(\ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the $K$-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes