SY LG ROApr 8, 2023

Stable and Safe Reinforcement Learning via a Barrier-Lyapunov Actor-Critic Approach

Liqun Zhao, Konstantinos Gatsis, Antonis Papachristodoulou

arXiv:2304.04066v39.728 citationsh-index: 51Has Code

Originality Incremental advance

AI Analysis

This addresses safety and stability issues in RL for control systems, which is critical for applications like robotics, but it is incremental as it builds on existing CBF and CLF methods.

The paper tackles the challenge of ensuring safety and stability in reinforcement learning for real-world systems by proposing a Barrier-Lyapunov Actor-Critic (BLAC) framework, which combines control barrier and Lyapunov functions with actor-critic methods and includes a backup controller, resulting in fewer safety violations and improved state approach compared to baselines.

Reinforcement learning (RL) has demonstrated impressive performance in various areas such as video games and robotics. However, ensuring safety and stability, which are two critical properties from a control perspective, remains a significant challenge when using RL to control real-world systems. In this paper, we first provide definitions of safety and stability for the RL system, and then combine the control barrier function (CBF) and control Lyapunov function (CLF) methods with the actor-critic method in RL to propose a Barrier-Lyapunov Actor-Critic (BLAC) framework which helps maintain the aforementioned safety and stability for the system. In this framework, CBF constraints for safety and CLF constraint for stability are constructed based on the data sampled from the replay buffer, and the augmented Lagrangian method is used to update the parameters of the RL-based controller. Furthermore, an additional backup controller is introduced in case the RL-based controller cannot provide valid control signals when safety and stability constraints cannot be satisfied simultaneously. Simulation results show that this framework yields a controller that can help the system approach the desired state and cause fewer violations of safety constraints compared to baseline algorithms.

View on arXiv PDF Code

Similar