Optimal Best Markovian Arm Identification with Fixed Confidence
This work addresses the sampling complexity of best arm identification in Markovian bandits, providing theoretical guarantees for a specific model, but it is incremental as it extends existing IID methods to a Markovian setting.
The paper tackles the problem of identifying the best arm in a one-parameter Markovian bandit model with fixed confidence, deriving instance-specific lower bounds and analyzing the Track-and-Stop strategy, which is shown to be at most a factor of four from optimal asymptotically.
We give a complete characterization of the sampling complexity of best Markovian arm identification in one-parameter Markovian bandit models. We derive instance specific nonasymptotic and asymptotic lower bounds which generalize those of the IID setting. We analyze the Track-and-Stop strategy, initially proposed for the IID setting, and we prove that asymptotically it is at most a factor of four apart from the lower bound. Our one-parameter Markovian bandit model is based on the notion of an exponential family of stochastic matrices for which we establish many useful properties. For the analysis of the Track-and-Stop strategy we derive a novel concentration inequality for Markov chains that may be of interest in its own right.