ML LGJan 21, 2021

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Marc Jourdan, Mojmír Mutný, Johannes Kirschner, Andreas Krause

arXiv:2101.08534v117.835 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently identifying optimal actions in structured decision-making problems, such as matroids or graph paths, which is incremental but provides practical computational improvements.

The paper tackles the pure-exploration problem in combinatorial bandits with semi-bandit feedback, developing a meta-algorithm that yields asymptotically optimal algorithms with finite-time guarantees and introducing the first computationally efficient, asymptotically optimal algorithm for best-arm identification in this setting.

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

View on arXiv PDF

Similar