MLLGJan 21, 2021

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

arXiv:2101.08534v135 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently identifying optimal actions in structured decision-making problems, such as matroids or graph paths, which is incremental but provides practical computational improvements.

The paper tackles the pure-exploration problem in combinatorial bandits with semi-bandit feedback, developing a meta-algorithm that yields asymptotically optimal algorithms with finite-time guarantees and introducing the first computationally efficient, asymptotically optimal algorithm for best-arm identification in this setting.

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes