MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control
This addresses safety-critical control problems for robotics and autonomous systems, offering a novel integration of stability guarantees with efficient exploration, though it builds on existing Lyapunov-based RL methods.
The paper tackles the challenge of ensuring verifiable stability guarantees in model-free reinforcement learning for safety-critical applications by introducing MSACL, which integrates exponential stability with maximum entropy RL, resulting in consistent superiority over baselines and state-of-the-art methods across six benchmarks, including rapid convergence and robustness against uncertainties.
For safety-critical applications, model-free reinforcement learning (RL) faces numerous challenges, particularly the difficulty of establishing verifiable stability guarantees while maintaining high exploration efficiency. To address these challenges, we present Multi-Step Actor-Critic Learning with Lyapunov Certificates (MSACL), a novel approach that seamlessly integrates exponential stability with maximum entropy reinforcement learning (MERL). In contrast to existing methods that rely on complex reward engineering and single-step constraints, MSACL utilizes intuitive rewards and multi-step data for actor-critic learning. Specifically, we first introduce Exponential Stability Labels (ESLs) to categorize samples and propose a $λ$-weighted aggregation mechanism to learn Lyapunov certificates. Leveraging these certificates, we then develop a stability-aware advantage function to guide policy optimization, thereby ensuring rapid Lyapunov descent and robust state convergence. We evaluate MSACL across six benchmarks, comprising four stabilization and two high-dimensional tracking tasks. Experimental results demonstrate its consistent superiority over both standard RL baselines and state-of-the-art Lyapunov-based RL algorithms. Beyond rapid convergence, MSACL exhibits significant robustness against environmental uncertainties and remarkable generalization to unseen reference signals. The source code and benchmarking environments are available at \href{https://github.com/YuanZhe-Xing/MSACL}{https://github.com/YuanZhe-Xing/MSACL}.