ML LGOct 24, 2025

Oracle-Efficient Combinatorial Semi-Bandits

Jung-hun Kim, Milan Vojnović, Min-hwan Oh

arXiv:2510.21431v17.81 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work significantly improves the scalability of combinatorial semi-bandit algorithms, making them more practical for applications requiring frequent combinatorial optimization.

This paper addresses the scalability issue in combinatorial semi-bandit problems, where an agent selects a subset of base arms and receives individual feedback. The authors propose oracle-efficient frameworks that reduce oracle calls from linear to doubly logarithmic in time, achieving $\tilde{O}(\sqrt{T})$ regret for worst-case linear reward settings using only $O(\log\log T)$ oracle queries.

We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.

View on arXiv PDF

Similar