LG MLMay 27, 2019

Learning Multiple Markov Chains via Adaptive Allocation

Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

arXiv:1905.11128v22.71 citations

Originality Incremental advance

AI Analysis

This work addresses a fundamental problem in sequential decision-making for learning multiple Markov chains, with potential applications in reinforcement learning and control, though it appears incremental in extending existing methods to this specific setting.

The paper tackles the problem of learning transition matrices for multiple Markov chains from sequential observations, introducing a novel algorithm that balances exploration and exploitation without prior knowledge. It provides finite-sample PAC-type guarantees and shows the algorithm asymptotically achieves optimal loss.

We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain. We assume that the Markov chains are ergodic but otherwise unknown. The learner can sample Markov chains sequentially to observe their states. The goal of the learner is to sequentially select various chains to learn transition matrices uniformly well with respect to some loss function. We introduce a notion of loss that naturally extends the squared loss for learning distributions to the case of Markov chains, and further characterize the notion of being \emph{uniformly good} in all problem instances. We present a novel learning algorithm that efficiently balances \emph{exploration} and \emph{exploitation} intrinsic to this problem, without any prior knowledge of the chains. We provide finite-sample PAC-type guarantees on the performance of the algorithm. Further, we show that our algorithm asymptotically attains an optimal loss.

View on arXiv PDF

Similar