A Dynamical Systems Framework for Reinforcement Learning Safety and Robustness Verification
This addresses the lack of formal verification methods for RL in safety-critical applications, providing a novel and interpretable assessment tool.
The paper tackles the problem of verifying safety and robustness in reinforcement learning policies for safety-critical systems by introducing a dynamical systems framework that uses Finite-Time Lyapunov Exponents and Lagrangian Coherent Structures to identify safety barriers and failure modes, and it demonstrates the framework's effectiveness in identifying flaws in policies that appear successful based on reward alone.
The application of reinforcement learning to safety-critical systems is limited by the lack of formal methods for verifying the robustness and safety of learned policies. This paper introduces a novel framework that addresses this gap by analyzing the combination of an RL agent and its environment as a discrete-time autonomous dynamical system. By leveraging tools from dynamical systems theory, specifically the Finite-Time Lyapunov Exponent (FTLE), we identify and visualize Lagrangian Coherent Structures (LCS) that act as the hidden "skeleton" governing the system's behavior. We demonstrate that repelling LCS function as safety barriers around unsafe regions, while attracting LCS reveal the system's convergence properties and potential failure modes, such as unintended "trap" states. To move beyond qualitative visualization, we introduce a suite of quantitative metrics, Mean Boundary Repulsion (MBR), Aggregated Spurious Attractor Strength (ASAS), and Temporally-Aware Spurious Attractor Strength (TASAS), to formally measure a policy's safety margin and robustness. We further provide a method for deriving local stability guarantees and extend the analysis to handle model uncertainty. Through experiments in both discrete and continuous control environments, we show that this framework provides a comprehensive and interpretable assessment of policy behavior, successfully identifying critical flaws in policies that appear successful based on reward alone.