LG MLFeb 9, 2019

Pure Exploration with Multiple Correct Answers

arXiv:1902.03475v122.1100 citations

Originality Incremental advance

AI Analysis

This addresses a theoretical gap in bandit algorithms for scenarios with multiple good answers, which is incremental as it builds on existing single-answer methods.

The paper tackles the sample complexity of pure exploration bandit problems with multiple correct answers, deriving a lower bound and presenting a new algorithm that extends Track-and-Stop to achieve asymptotically optimal sample complexity matching this bound.

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.

View on arXiv PDF

Similar