Finite-Time Accuracy of Temporal-Difference Learning Under Schur-Stable Recursions
This work addresses the need for improved statistical efficiency in policy evaluation for reinforcement learning practitioners, though it is incremental as it builds on existing finite-time analysis efforts.
The paper tackles the problem of finite-time error bounds for temporal-difference (TD) learning in reinforcement learning by developing a new analysis based on a discrete-time stochastic linear system representation and Schur stability, resulting in specific bounds and a reusable framework for future research.
Temporal difference (TD) learning is a cornerstone reinforcement learning (RL) method for policy evaluation, where the goal is to estimate the value function of a Markov decision process under a fixed policy. While a substantial body of work has established its convergence and stability properties, more recent efforts have focused on its statistical efficiency through finite-time error bounds. In this paper, we advance this line of research by developing a new finite-time error analysis for tabular TD learning that directly exploits a discrete-time stochastic linear system representation and leverages Schur stability of the associated matrices. Beyond the specific bounds obtained, the proposed framework provides a reusable template for analyzing TD learning and related RL algorithms, and it offers control-theoretic insights that may guide future developments in finite-sample RL theory.