LG AI RO SYDec 31, 2025

MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

Yongwei Zhang, Yuanzhe Xing, Quanyi Liang, Quan Quan, Zhikun She

arXiv:2512.24955v2h-index: 2Has Code

Originality Highly original

AI Analysis

This addresses safety-critical control problems for robotics and autonomous systems, offering a novel integration of stability guarantees with efficient exploration, though it builds on existing Lyapunov-based RL methods.

The paper tackles the challenge of ensuring verifiable stability guarantees in model-free reinforcement learning for safety-critical applications by introducing MSACL, which integrates exponential stability with maximum entropy RL, resulting in consistent superiority over baselines and state-of-the-art methods across six benchmarks, including rapid convergence and robustness against uncertainties.

For safety-critical applications, model-free reinforcement learning (RL) faces numerous challenges, particularly the difficulty of establishing verifiable stability guarantees while maintaining high exploration efficiency. To address these challenges, we present Multi-Step Actor-Critic Learning with Lyapunov Certificates (MSACL), a novel approach that seamlessly integrates exponential stability with maximum entropy reinforcement learning (MERL). In contrast to existing methods that rely on complex reward engineering and single-step constraints, MSACL utilizes intuitive rewards and multi-step data for actor-critic learning. Specifically, we first introduce Exponential Stability Labels (ESLs) to categorize samples and propose a $λ$-weighted aggregation mechanism to learn Lyapunov certificates. Leveraging these certificates, we then develop a stability-aware advantage function to guide policy optimization, thereby ensuring rapid Lyapunov descent and robust state convergence. We evaluate MSACL across six benchmarks, comprising four stabilization and two high-dimensional tracking tasks. Experimental results demonstrate its consistent superiority over both standard RL baselines and state-of-the-art Lyapunov-based RL algorithms. Beyond rapid convergence, MSACL exhibits significant robustness against environmental uncertainties and remarkable generalization to unseen reference signals. The source code and benchmarking environments are available at \href{https://github.com/YuanZhe-Xing/MSACL}{https://github.com/YuanZhe-Xing/MSACL}.

View on arXiv PDF Code

Similar