MLLGOct 24, 2025

Oracle-Efficient Combinatorial Semi-Bandits

arXiv:2510.21431v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work significantly improves the scalability of combinatorial semi-bandit algorithms, making them more practical for applications requiring frequent combinatorial optimization.

This paper addresses the scalability issue in combinatorial semi-bandit problems, where an agent selects a subset of base arms and receives individual feedback. The authors propose oracle-efficient frameworks that reduce oracle calls from linear to doubly logarithmic in time, achieving $\tilde{O}(\sqrt{T})$ regret for worst-case linear reward settings using only $O(\log\log T)$ oracle queries.

We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes