Concentration Phenomenon for Random Dynamical Systems: An Operator Theoretic Approach
This work provides theoretical tools for reinforcement learning and control communities, enabling concentration inequalities for unbounded reward functions without requiring exact system knowledge or reversibility, though it is incremental in building on existing operator theory.
The paper tackles the problem of deriving concentration bounds for unbounded observables in Markov chains with unbounded state spaces, using an operator-theoretic approach to circumvent traditional probabilistic methods, resulting in sharp non-asymptotic concentration inequalities under conditions like hyperboundedness and transport-entropy inequalities.
Via operator theoretic methods, we formalize the concentration phenomenon for a given observable `$r$' of a discrete time Markov chain with `$μ_π$' as invariant ergodic measure, possibly having support on an unbounded state space. The main contribution of this paper is circumventing tedious probabilistic methods with a study of a composition of the Markov transition operator $P$ followed by a multiplication operator defined by $e^{r}$. It turns out that even if the observable/ reward function is unbounded, but for some for some $q>2$, $\|e^{r}\|_{q \rightarrow 2} \propto \exp\big(μ_π(r) +\frac{2q}{q-2}\big) $ and $P$ is hyperbounded with norm control $\|P\|_{2 \rightarrow q }< e^{\frac{1}{2}[\frac{1}{2}-\frac{1}{q}]}$, sharp non-asymptotic concentration bounds follow. \emph{Transport-entropy} inequality ensures the aforementioned upper bound on multiplication operator for all $q>2$. The role of \emph{reversibility} in concentration phenomenon is demystified. These results are particularly useful for the reinforcement learning and controls communities as they allow for concentration inequalities w.r.t standard unbounded obersvables/reward functions where exact knowledge of the system is not available, let alone the reversibility of stationary measure.