LG OCSep 30, 2025

Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning

arXiv:2509.26442v18 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses a foundational theoretical bottleneck in stochastic approximation and reinforcement learning, enabling analysis of algorithms where previous conditions were too restrictive, though it is incremental as it builds on an existing theorem.

The authors tackled the limitation of the Robbins-Siegmund theorem, which requires a summable zero-order term, by extending it to allow square summable conditions, enabling almost sure convergence to a bounded set and providing convergence rates and bounds. They applied this to reinforcement learning, achieving the first almost sure convergence rate, high probability concentration bound, and L^p convergence rate for Q-learning with linear function approximation.

The Robbins-Siegmund theorem establishes the convergence of stochastic processes that are almost supermartingales and is foundational for analyzing a wide range of stochastic iterative algorithms in stochastic approximation and reinforcement learning (RL). However, its original form has a significant limitation as it requires the zero-order term to be summable. In many important RL applications, this summable condition, however, cannot be met. This limitation motivates us to extend the Robbins-Siegmund theorem for almost supermartingales where the zero-order term is not summable but only square summable. Particularly, we introduce a novel and mild assumption on the increments of the stochastic processes. This together with the square summable condition enables an almost sure convergence to a bounded set. Additionally, we further provide almost sure convergence rates, high probability concentration bounds, and $L^p$ convergence rates. We then apply the new results in stochastic approximation and RL. Notably, we obtain the first almost sure convergence rate, the first high probability concentration bound, and the first $L^p$ convergence rate for $Q$-learning with linear function approximation.

View on arXiv PDF

Similar