LGOCMLFeb 20, 2025

A Non-Asymptotic Theory of Seminorm Lyapunov Stability: From Deterministic to Stochastic Iterative Algorithms

arXiv:2502.14208v111 citationsh-index: 27
Originality Highly original
AI Analysis

This work offers a unified theoretical framework for analyzing finite-sample bounds in reinforcement learning algorithms, such as TD(λ) and Q-learning, addressing stability and convergence in average reward settings.

The paper tackles the problem of solving fixed-point equations for seminorm-contractive operators by establishing non-asymptotic convergence results for deterministic and stochastic iterative algorithms, proving geometric convergence in deterministic settings and providing finite-sample analyses for stochastic settings with Markovian noise.

We study the problem of solving fixed-point equations for seminorm-contractive operators and establish foundational results on the non-asymptotic behavior of iterative algorithms in both deterministic and stochastic settings. Specifically, in the deterministic setting, we prove a fixed-point theorem for seminorm-contractive operators, showing that iterates converge geometrically to the kernel of the seminorm. In the stochastic setting, we analyze the corresponding stochastic approximation (SA) algorithm under seminorm-contractive operators and Markovian noise, providing a finite-sample analysis for various stepsize choices. A benchmark for equation solving is linear systems of equations, where the convergence behavior of fixed-point iteration is closely tied to the stability of linear dynamical systems. In this special case, our results provide a complete characterization of system stability with respect to a seminorm, linking it to the solution of a Lyapunov equation in terms of positive semi-definite matrices. In the stochastic setting, we establish a finite-sample analysis for linear Markovian SA without requiring the Hurwitzness assumption. Our theoretical results offer a unified framework for deriving finite-sample bounds for various reinforcement learning algorithms in the average reward setting, including TD($λ$) for policy evaluation (which is a special case of solving a Poisson equation) and Q-learning for control.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes