LG OC MLJun 1, 2025

A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity

arXiv:2506.01052v211.43 citationsh-index: 3

Originality Highly original

AI Analysis

This provides a more practical convergence guarantee for reinforcement learning algorithms, addressing a limitation in prior theoretical work.

The paper tackles the problem of ensuring convergence in Temporal Difference (TD) learning with linear function approximation without relying on artificial projection assumptions, and shows that a projection-free variant converges with a rate of O(||θ*||²/√T) even under Markovian noise.

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in the field of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the minimal curvature of the potential function. While prior work has established convergence guarantees in this setting, these results typically rely on the assumption that each iterate is projected onto a bounded set, a condition that is both artificial and does not match the current practice. In this paper, we challenge the necessity of such an assumption and present a refined analysis of TD learning. For the first time, we show that the simple projection-free variant converges with a rate of $\widetilde{\mathcal{O}}(\frac{||θ^*||^2_2}{\sqrt{T}})$, even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

View on arXiv PDF

Similar