Joint Differentiable Optimization and Verification for Certified Reinforcement Learning
This addresses the problem of ensuring certified properties in safety-critical control systems, representing an incremental improvement over existing verification-after-learning approaches.
The paper tackles the challenge of certifying safety and stability in model-based reinforcement learning by proposing a framework that jointly conducts learning and formal verification through differentiable bilevel optimization. Experiments show significant advantages over existing methods in finding feasible controllers with barrier and Lyapunov functions.
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is differentiable by the gradients from the value function and certificates. Experiments on a variety of examples demonstrate the significant advantages of our framework over the model-based stochastic value gradient (SVG) method and the model-free proximal policy optimization (PPO) method in finding feasible controllers with barrier functions and Lyapunov functions that ensure system safety and stability.