Optimal Best-arm Identification in Linear Bandits
This work addresses a core challenge in bandit optimization for decision-making systems, offering a more efficient solution with theoretical guarantees, though it appears incremental in improving upon prior strategies.
The paper tackles the problem of best-arm identification in stochastic linear bandits with fixed confidence, aiming to minimize sampling budget while ensuring certainty, and presents an algorithm that matches known lower bounds in complexity and outperforms existing methods in experiments.
We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds, asymptotically almost surely and in expectation. The algorithm relies on an arm sampling rule that tracks an optimal proportion of arm draws, and that remarkably can be updated as rarely as we wish, without compromising its theoretical guarantees. Moreover, unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms. Experimental results suggest that our algorithm significantly outperforms existing algorithms. The paper further provides a first analysis of the best-arm identification problem in linear bandits with a continuous set of arms.