Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
This addresses a theoretical gap in reinforcement learning for policy evaluation, enabling more practical applications with arbitrary features, though it is incremental as it builds on existing TD methods.
The paper tackles the problem of establishing convergence rates for linear TD(λ) reinforcement learning under arbitrary features, which previously required linearly independent features, and provides the first L² convergence rates for both discounted and average-reward settings without algorithmic changes or extra assumptions.
Linear TD($λ$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold in many practical scenarios. This paper instead establishes the first $L^2$ convergence rates for linear TD($λ$) operating under arbitrary features, without making any algorithmic modification or additional assumptions. Our results apply to both the discounted and average-reward settings. To address the potential non-uniqueness of solutions resulting from arbitrary features, we develop a novel stochastic approximation result featuring convergence rates to the solution set instead of a single point.