The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime
This work addresses adaptive sampling analysis for researchers in bandit algorithms, offering a novel perspective that bridges theoretical gaps and leads to practical improvements, though it is incremental in building on existing methods.
The paper tackles the problem of analyzing adaptive sampling in multi-arm bandit problems by introducing the Simulator technique, which focuses on distinguishing good from bad sampling strategies given limited data. The result includes proving the first instance-based lower bounds for top-k problems with appropriate log factors, uncovering new phenomena about individual arm pulls, and developing a near-optimal algorithm that outperforms state-of-the-art methods in experiments.
We propose a novel technique for analyzing adaptive sampling called the {\em Simulator}. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of perspective allows us to match the strength of both Fano and change-of-measure techniques, without succumbing to the limitations of either method. For concreteness, we apply our techniques to a structured multi-arm bandit problem in the fixed-confidence pure exploration setting, where we show that the constraints on the means imply a substantial gap between the moderate-confidence sample complexity, and the asymptotic sample complexity as $δ\to 0$ found in the literature. We also prove the first instance-based lower bounds for the top-k problem which incorporate the appropriate log-factors. Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity. Our new analysis inspires a simple and near-optimal algorithm for the best-arm and top-k identification, the first {\em practical} algorithm of its kind for the latter problem which removes extraneous log factors, and outperforms the state-of-the-art in experiments.