LGDSMLAug 8, 2025

Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

arXiv:2508.06247v1h-index: 3
Originality Highly original
AI Analysis

This work addresses a key bottleneck in sequential decision-making for applications like recommendation systems, offering a near-optimal and efficient solution.

The paper tackles the trade-off between computational efficiency and regret performance in combinatorial multi-armed bandits by introducing CMOSS, an algorithm that achieves an instance-independent regret of O((log k)^2 sqrt(kmT)), eliminating a log T factor and matching the lower bound up to logarithmic terms.

The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from additional regret factor $\log T$ that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of $O\big( (\log k)^2\sqrt{kmT}\big )$ under semi-bandit feedback, where $m$ is the number of arms and $k$ is the maximum cardinality of a feasible action. Crucially, this result eliminates the dependency on $\log T$ and matches the established $Ω\big( \sqrt{kmT}\big)$ lower bound up to $O\big((\log k)^2\big)$. We then extend our analysis to show that CMOSS is also applicable to cascading feedback. Experiments on synthetic and real-world datasets validate that CMOSS consistently outperforms benchmark algorithms in both regret and runtime efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes