Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates
This addresses the problem of verifying safety-critical policies in stochastic systems for applications like robotics and autonomous systems, offering an incremental improvement over existing methods.
The paper tackles the verification of neural network policies in stochastic dynamical systems for reach-avoid specifications by introducing logarithmic reach-avoid supermartingales (logRASMs) and a method for computing tighter Lipschitz constant bounds, achieving verification with probabilities up to 99.9999%.
We consider the verification of neural network policies for discrete-time stochastic systems with respect to reach-avoid specifications. We use a learner-verifier procedure that learns a certificate for the specification, represented as a neural network. Verifying that this neural network certificate is a so-called reach-avoid supermartingale (RASM) proves the satisfaction of a reach-avoid specification. Existing approaches for such a verification task rely on computed Lipschitz constants of neural networks. These approaches struggle with large Lipschitz constants, especially for reach-avoid specifications with high threshold probabilities. We present two key contributions to obtain smaller Lipschitz constants than existing approaches. First, we introduce logarithmic RASMs (logRASMs), which take exponentially smaller values than RASMs and hence have lower theoretical Lipschitz constants. Second, we present a fast method to compute tighter upper bounds on Lipschitz constants based on weighted norms. Our empirical evaluation shows we can consistently verify the satisfaction of reach-avoid specifications with probabilities as high as 99.9999%.