The Batch Complexity of Bandit Pure Exploration
This work addresses the efficiency of batched algorithms for bandit pure exploration, which is incremental as it builds on existing methods by focusing on batch complexity.
The paper tackles the problem of minimizing the number of batches in fixed-confidence pure exploration for stochastic multi-armed bandits, providing an instance-dependent lower bound and a general algorithm with proven upper bounds on sample and batch complexity, illustrated through best-arm identification and thresholding bandits.
In a fixed-confidence pure exploration problem in stochastic multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations. We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task. We then give a general batched algorithm and prove upper bounds on its expected sample complexity and batch complexity. We illustrate both lower and upper bounds on best-arm identification and thresholding bandits.