STLGMLDec 2, 2019

Optimal Best Markovian Arm Identification with Fixed Confidence

arXiv:1912.00636v324 citations
Originality Incremental advance
AI Analysis

This work addresses the sampling complexity of best arm identification in Markovian bandits, providing theoretical guarantees for a specific model, but it is incremental as it extends existing IID methods to a Markovian setting.

The paper tackles the problem of identifying the best arm in a one-parameter Markovian bandit model with fixed confidence, deriving instance-specific lower bounds and analyzing the Track-and-Stop strategy, which is shown to be at most a factor of four from optimal asymptotically.

We give a complete characterization of the sampling complexity of best Markovian arm identification in one-parameter Markovian bandit models. We derive instance specific nonasymptotic and asymptotic lower bounds which generalize those of the IID setting. We analyze the Track-and-Stop strategy, initially proposed for the IID setting, and we prove that asymptotically it is at most a factor of four apart from the lower bound. Our one-parameter Markovian bandit model is based on the notion of an exponential family of stochastic matrices for which we establish many useful properties. For the analysis of the Track-and-Stop strategy we derive a novel concentration inequality for Markov chains that may be of interest in its own right.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes