LGSYJan 28, 2022

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

arXiv:2201.12243v224 citations
AI Analysis

This addresses the problem of ensuring certified properties in safety-critical control systems, representing an incremental improvement over existing verification-after-learning approaches.

The paper tackles the challenge of certifying safety and stability in model-based reinforcement learning by proposing a framework that jointly conducts learning and formal verification through differentiable bilevel optimization. Experiments show significant advantages over existing methods in finding feasible controllers with barrier and Lyapunov functions.

In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is differentiable by the gradients from the value function and certificates. Experiments on a variety of examples demonstrate the significant advantages of our framework over the model-based stochastic value gradient (SVG) method and the model-free proximal policy optimization (PPO) method in finding feasible controllers with barrier functions and Lyapunov functions that ensure system safety and stability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes