ML LGNov 15, 2018

Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs

Maryam Aziz, Kevin Jamieson, Javed Aslam

arXiv:1811.06149v23.53 citations

Originality Incremental advance

AI Analysis

This addresses a fundamental challenge in pure-exploration bandits for scenarios with vast arm sets, such as hyperparameter tuning, though it is incremental by extending analysis to general distributions.

The paper tackles the problem of finding an ε-good arm in infinite-armed bandits with general reservoir distributions, establishing necessary and sufficient conditions on budget and providing an algorithm based on successive halving with early discarding. The result explains why the most aggressive bracket in Hyperband is empirically best.

This paper considers a multi-armed bandit game where the number of arms is much larger than the maximum budget and is effectively infinite. We characterize necessary and sufficient conditions on the total budget for an algorithm to return an ε-good arm with probability at least 1 - δ. In such situations, the sample complexity depends on ε, δ and the so-called reservoir distribution ν from which the means of the arms are drawn iid. While a substantial literature has developed around analyzing specific cases of ν such as the beta distribution, our analysis makes no assumption about the form of ν. Our algorithm is based on successive halving with the surprising exception that arms start to be discarded after just a single pull, requiring an analysis that goes beyond concentration alone. The provable correctness of this algorithm also provides an explanation for the empirical observation that the most aggressive bracket of the Hyperband algorithm of Li et al. (2017) for hyperparameter tuning is almost always best.

View on arXiv PDF

Similar