ST LGOct 27, 2021

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn

arXiv:2110.14427v612.249 citations

Originality Highly original

AI Analysis

This provides foundational theoretical guarantees for stochastic approximation and reinforcement learning methods, addressing a key bottleneck in handling parameter-dependent noise.

The paper tackles the problem of establishing asymptotic statistics for stochastic approximation algorithms with parameter-dependent noise, proving convergence in L4, functional central limit theorems, and covariance convergence, with an example showing divergence of second moments under certain conditions.

The paper concerns the $d$-dimensional stochastic approximation recursion, $$ θ_{n+1}= θ_n + α_{n + 1} f(θ_n, Φ_{n+1}) $$ where $ \{ Φ_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\textsf{E}[ z_n z_n^T ]$ to the asymptotic covariance in the CLT, where $z_n =: (θ_n-θ^*)/\sqrt{α_n}$. (iii) The CLT holds for the normalized version $z^{\text{PR}}_n =: \sqrt{n} [θ^{\text{PR}}_n -θ^*]$, of the averaged parameters $θ^{\text{PR}}_n =:n^{-1} \sum_{k=1}^nθ_k$, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $θ$, and $Φ$ is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of $θ_n$ is unbounded and in fact diverges. This arXiv version represents a major extension of the results in prior versions.The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.

View on arXiv PDF

Similar