Learning from non-irreducible Markov chains
This work addresses a theoretical gap for machine learning practitioners dealing with temporally dependent data, though it is incremental as it extends existing results from irreducible to non-irreducible Markov chains.
The paper tackles the problem of supervised learning when training data is drawn from non-irreducible Markov chains, which violate the i.i.d. assumption common in machine learning. It proves learnability and generalization bounds for an approximate sample error minimization algorithm under uniform ergodicity and regularity conditions.
Mostof the existing literature on supervised machine learning problems focuses on the case when the training data set is drawn from an i.i.d. sample. However, many practical problems are characterized by temporal dependence and strong correlation between the marginals of the data-generating process, suggesting that the i.i.d. assumption is not always justified. This problem has been already considered in the context of Markov chains satisfying the Doeblin condition. This condition, among other things, implies that the chain is not singular in its behavior, i.e. it is irreducible. In this article, we focus on the case when the training data set is drawn from a not necessarily irreducible Markov chain. Under the assumption that the chain is uniformly ergodic with respect to the $\mathrm{L}^1$-Wasserstein distance, and certain regularity assumptions on the hypothesis class and the state space of the chain, we first obtain a uniform convergence result for the corresponding sample error, and then we conclude learnability of the approximate sample error minimization algorithm and find its generalization bounds. At the end, a relative uniform convergence result for the sample error is also discussed.