Fixed Confidence Best Arm Identification in the Bayesian Setting
This work addresses the problem of efficiently identifying the best arm in Bayesian bandits for researchers and practitioners, representing an incremental improvement by adapting existing methods to a new setting.
The paper tackles the fixed-confidence best arm identification problem in the Bayesian setting, showing that traditional frequentist algorithms like track-and-stop are arbitrarily suboptimal and introducing a variant of successive elimination that matches a derived lower bound up to a logarithmic factor, with simulations verifying these results.
We consider the fixed-confidence best arm identification (FC-BAI) problem in the Bayesian setting. This problem aims to find the arm of the largest mean with a fixed confidence level when the bandit model has been sampled from the known prior. Most studies on the FC-BAI problem have been conducted in the frequentist setting, where the bandit model is predetermined before the game starts. We show that the traditional FC-BAI algorithms studied in the frequentist setting, such as track-and-stop and top-two algorithms, result in arbitrarily suboptimal performances in the Bayesian setting. We also obtain a lower bound of the expected number of samples in the Bayesian setting and introduce a variant of successive elimination that has a matching performance with the lower bound up to a logarithmic factor. Simulations verify the theoretical results.