Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning
This work provides foundational theoretical guarantees for TD learning algorithms in reinforcement learning, which is incremental as it extends existing central limit theorem results to a specific application.
The authors tackled the problem of establishing non-asymptotic convergence rates in the central limit theorem for Markov chains, specifically applying it to Temporal Difference (TD) learning with averaging, resulting in a theoretical framework for analyzing TD learning convergence.
We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.