SYLGROSep 11, 2025

Off Policy Lyapunov Stability in Reinforcement Learning

arXiv:2509.09863v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses stability certification for reinforcement learning practitioners, but it is incremental as it builds on existing methods like Soft Actor Critic and Proximal Policy Optimization.

The paper tackles the problem of sample inefficiency in reinforcement learning algorithms that learn Lyapunov functions for stability guarantees by introducing an off-policy method, which improves performance in simulations like inverted pendulum and quadrotor tasks.

Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes