MLLGSTNov 14, 2022

Offline Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

arXiv:2211.07092v45 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work provides theoretical foundations for offline estimation in controlled Markov chains, which is incremental but important for applications like reinforcement learning and policy evaluation.

The paper tackles the problem of estimating transition probabilities in controlled Markov chains from offline data, establishing sample complexity bounds and minimax optimality conditions that depend on the mixing properties of the logging policy used for data collection.

In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an offline setting with a fixed dataset, collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing properties. We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples. We demonstrate the validity of our results under various examples, such as ergodic Markov chains, weakly ergodic inhomogeneous Markov chains, and controlled Markov chains with non-stationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for offline evaluation of stationary Markov control policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes