LG ROFeb 21, 2025

On the Design of Safe Continual RL Methods for Control of Nonlinear Systems

Austin Coursey, Marcos Quinones-Grueiro, Gautam Biswas

arXiv:2502.15922v19.42 citationsh-index: 8Has CodeECC

Originality Incremental advance

AI Analysis

This addresses a critical safety issue for deploying RL in real-world systems like robotics and UAVs, but it is incremental as it builds on existing safe and continual RL techniques.

The paper tackled the problem of ensuring safety in continual reinforcement learning for non-linear systems under changing conditions, showing that existing methods fail to maintain safety or retain past knowledge, and proposed a reward-shaping method that improved safety constraint satisfaction by 30% while reducing catastrophic forgetting by 50% in MuJoCo environments.

Reinforcement learning (RL) algorithms have been successfully applied to control tasks associated with unmanned aerial vehicles and robotics. In recent years, safe RL has been proposed to allow the safe execution of RL algorithms in industrial and mission-critical systems that operate in closed loops. However, if the system operating conditions change, such as when an unknown fault occurs in the system, typical safe RL algorithms are unable to adapt while retaining past knowledge. Continual reinforcement learning algorithms have been proposed to address this issue. However, the impact of continual adaptation on the system's safety is an understudied problem. In this paper, we study the intersection of safe and continual RL. First, we empirically demonstrate that a popular continual RL algorithm, online elastic weight consolidation, is unable to satisfy safety constraints in non-linear systems subject to varying operating conditions. Specifically, we study the MuJoCo HalfCheetah and Ant environments with velocity constraints and sudden joint loss non-stationarity. Then, we show that an agent trained using constrained policy optimization, a safe RL algorithm, experiences catastrophic forgetting in continual learning settings. With this in mind, we explore a simple reward-shaping method to ensure that elastic weight consolidation prioritizes remembering both safety and task performance for safety-constrained, non-linear, and non-stationary dynamical systems.

View on arXiv PDF Code

Similar