LGMLFeb 3, 2019

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

arXiv:1902.00923v3298 citations
Originality Incremental advance
AI Analysis

This work addresses the theoretical analysis of stochastic approximation algorithms, which is important for researchers in reinforcement learning and optimization, but it is incremental as it builds on existing methods to solve a specific open problem.

The paper tackles the problem of analyzing finite-time error bounds for linear stochastic approximation algorithms, including temporal difference learning, by deriving bounds on moments of the error using Lyapunov functions, and solves an open problem by providing these bounds without requiring projection or i.i.d. noise assumptions.

We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i.e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE). We obtain finite-time bounds on the mean-square error in the case of constant step-size algorithms by considering the drift of an appropriately chosen Lyapunov function. The Lyapunov function can be interpreted either in terms of Stein's method to obtain bounds on steady-state performance or in terms of Lyapunov stability theory for linear ODEs. We also provide a comprehensive treatment of the moments of the square of the 2-norm of the approximation error. Our analysis yields the following results: (i) for a given step-size, we show that the lower-order moments can be made small as a function of the step-size and can be upper-bounded by the moments of a Gaussian random variable; (ii) we show that the higher-order moments beyond a threshold may be infinite in steady-state; and (iii) we characterize the number of samples needed for the finite-time bounds to be of the same order as the steady-state bounds. As a by-product of our analysis, we also solve the open problem of obtaining finite-time bounds for the performance of temporal difference learning algorithms with linear function approximation and a constant step-size, without requiring a projection step or an i.i.d. noise assumption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes