MLLGJun 4

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

arXiv:2606.0596718.8
AI Analysis

For reinforcement learning researchers, this provides a theoretically optimal and robust convergence guarantee for a fundamental algorithm, addressing a long-standing bottleneck in TD learning theory.

The paper establishes a new O(1/k) convergence rate for TD(0) with linear function approximation that is robust to ill-conditioning and does not depend on the smallest eigenvalue of the covariance matrix, unlike prior work. The rate is sharp up to a multiplicative constant less than 11.

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) robust to ill-conditioning: it only depends on an initial error and modelindependent constants and (iii) sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes