LGOCMLFeb 3, 2020

Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes

arXiv:2002.00874v637 citations
AI Analysis

This provides improved theoretical guarantees for reinforcement learning algorithms, particularly benefiting off-policy TD-learning with V-trace, though it is incremental in extending existing SA analysis.

The paper tackles the finite-sample error bounds of Stochastic Approximation (SA) for contraction mappings under arbitrary norms, achieving a convergence bound with only logarithmic dependence on state-space size. It applies this to establish the first-known convergence rate for V-trace in off-policy TD-learning and recovers state-of-the-art results for on-policy TD-learning and Q-learning.

Stochastic Approximation (SA) is a popular approach for solving fixed-point equations where the information is corrupted by noise. In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different stepsizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning. Moreover, we also use it to study TD-learning in the on-policy setting, and recover the existing state-of-the-art results for $Q$-learning. Importantly, our construction results in only a logarithmic dependence of the convergence bound on the size of the state-space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes