MLLGOct 21, 2024

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

arXiv:2410.16106v413 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work provides improved statistical inference tools for reinforcement learning practitioners, offering incremental theoretical advancements over existing methods.

The paper tackles the problem of estimating parameters in Temporal Difference learning with linear function approximation by deriving sharper high probability convergence guarantees, refined Berry-Esseen bounds, and an efficient online estimator for asymptotic covariance, enabling confidence region construction with finite-sample coverage.

We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the optimal linear approximation to the value function. Assuming independent samples, we make three theoretical contributions that improve upon the current state-of-the-art results: (i) we derive sharper high probability convergence guarantees that depend explicitly on the asymptotic variance and hold under weaker conditions than those adopted in the literature; (ii) we establish refined high-dimensional Berry-Esseen bounds over the class of convex sets, achieving faster rates than the best known results, and (iii) we propose and analyze a novel, computationally efficient online plug-in estimator of the asymptotic covariance matrix. These results enable the construction of confidence regions and simultaneous confidence intervals for the linear parameters of the value function approximation, with guaranteed finite-sample coverage. We demonstrate the applicability of our theoretical findings through numerical experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes