ROOct 19, 2022
Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial ZonotopesNiklas Kochdumper, Hanna Krasowski, Xiao Wang et al.
While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.
CVJul 6, 2022
Open- and Closed-Loop Neural Network Verification using Polynomial ZonotopesNiklas Kochdumper, Christian Schilling, Matthias Althoff et al.
We present a novel approach to efficiently compute tight non-convex enclosures of the image through neural networks with ReLU, sigmoid, or hyperbolic tangent activation functions. In particular, we abstract the input-output relation of each neuron by a polynomial approximation, which is evaluated in a set-based manner using polynomial zonotopes. While our approach can also can be beneficial for open-loop neural network verification, our main application is reachability analysis of neural network controlled systems, where polynomial zonotopes are able to capture the non-convexity caused by the neural network as well as the system dynamics. This results in a superior performance compared to other methods, as we demonstrate on various benchmarks.
LGJan 14, 2023
First Three Years of the International Verification of Neural Networks Competition (VNN-COMP)Christopher Brix, Mark Niklas Müller, Stanley Bak et al.
This paper presents a summary and meta-analysis of the first three iterations of the annual International Verification of Neural Networks Competition (VNN-COMP) held in 2020, 2021, and 2022. In the VNN-COMP, participants submit software tools that analyze whether given neural networks satisfy specifications describing their input-output behavior. These neural networks and specifications cover a variety of problem classes and tasks, corresponding to safety and robustness properties in image classification, neural control, reinforcement learning, and autonomous systems. We summarize the key processes, rules, and results, present trends observed over the last three years, and provide an outlook into possible future developments.
LGDec 20, 2022
The Third International Verification of Neural Networks Competition (VNN-COMP 2022): Summary and ResultsMark Niklas Müller, Christopher Brix, Stanley Bak et al. · eth-zurich
This report summarizes the 3rd International Verification of Neural Networks Competition (VNN-COMP 2022), held as a part of the 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), which was collocated with the 34th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2022 iteration, 11 teams participated on a diverse set of 12 scored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.
NAMar 5, 2019
Numerical Verification of Affine Systems with up to a Billion DimensionsStanley Bak, Hoang-Dung Tran, Taylor T. Johnson
Affine systems reachability is the basis of many verification methods. With further computation, methods exist to reason about richer models with inputs, nonlinear differential equations, and hybrid dynamics. As such, the scalability of affine systems verification is a prerequisite to scalable analysis for more complex systems. In this paper, we improve the scalability of affine systems verification, in terms of the number of dimensions (variables) in the system. The reachable states of affine systems can be written in terms of the matrix exponential, and safety checking can be performed at specific time steps with linear programming. Unfortunately, for large systems with many state variables, this direct approach requires an intractable amount of memory while using an intractable amount of computation time. We overcome these challenges by combining several methods that leverage common problem structure. Memory is reduced by exploiting initial states that are not full-dimensional and safety properties (outputs) over a few linear projections of the state variables. Computation time is saved by using numerical simulations to compute only projections of the matrix exponential relevant for the verification problem. Since large systems often have sparse dynamics, we use Krylov-subspace simulation approaches based on the Arnoldi or Lanczos iterations. Our method produces accurate counter-examples when properties are violated and, in the extreme case with sufficient problem structure, can analyze a system with one billion real-valued state variables.
LGDec 16, 2022
Provable Fairness for Neural Network Models using Formal VerificationGiorgian Borca-Tasciuc, Xingzhi Guo, Stanley Bak et al.
Machine learning models are increasingly deployed for critical decision-making tasks, making it important to verify that they do not contain gender or racial biases picked up from training data. Typical approaches to achieve fairness revolve around efforts to clean or curate training data, with post-hoc statistical evaluation of the fairness of the model on evaluation data. In contrast, we propose techniques to \emph{prove} fairness using recently developed formal methods that verify properties of neural network models.Beyond the strength of guarantee implied by a formal proof, our methods have the advantage that we do not need explicit training or evaluation data (which is often proprietary) in order to analyze a given trained model. In experiments on two familiar datasets in the fairness literature (COMPAS and ADULTS), we show that through proper training, we can reduce unfairness by an average of 65.4\% at a cost of less than 1\% in AUC score.
LGDec 22, 2025
The 6th International Verification of Neural Networks Competition (VNN-COMP 2025): Summary and ResultsKonstantin Kaulen, Tobias Ladner, Stanley Bak et al.
This report summarizes the 6th International Verification of Neural Networks Competition (VNN-COMP 2025), held as a part of the 8th International Symposium on AI Verification (SAIV), that was collocated with the 37th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2025 iteration, 8 teams participated on a diverse set of 16 regular and 9 extended benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.
LGDec 28, 2023
The Fourth International Verification of Neural Networks Competition (VNN-COMP 2023): Summary and ResultsChristopher Brix, Stanley Bak, Changliu Liu et al.
This report summarizes the 4th International Verification of Neural Networks Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with the 35th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2023 iteration, 7 teams participated on a diverse set of 10 scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.
LGDec 28, 2024
The Fifth International Verification of Neural Networks Competition (VNN-COMP 2024): Summary and ResultsChristopher Brix, Stanley Bak, Taylor T. Johnson et al.
This report summarizes the 5th International Verification of Neural Networks Competition (VNN-COMP 2024), held as a part of the 7th International Symposium on AI Verification (SAIV), that was collocated with the 36th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2024 iteration, 8 teams participated on a diverse set of 12 regular and 8 extended benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.
NAJan 17, 2022
Neural Network Compression of ACAS Xu Early Prototype is Unsafe: Closed-Loop Verification through Quantized State BackreachabilityStanley Bak, Hoang-Dung Tran
ACAS Xu is an air-to-air collision avoidance system designed for unmanned aircraft that issues horizontal turn advisories to avoid an intruder aircraft. Due the use of a large lookup table in the design, a neural network compression of the policy was proposed. Analysis of this system has spurred a significant body of research in the formal methods community on neural network verification. While many powerful methods have been developed, most work focuses on open-loop properties of the networks, rather than the main point of the system -- collision avoidance -- which requires closed-loop analysis. In this work, we develop a technique to verify a closed-loop approximation of the system using state quantization and backreachability. We use favorable assumptions for the analysis -- perfect sensor information, instant following of advisories, ideal aircraft maneuvers and an intruder that only flies straight. When the method fails to prove the system is safe, we refine the quantization parameters until generating counterexamples where the original (non-quantized) system also has collisions.
ROOct 3, 2021
Stress Testing Autonomous Racing Overtake Maneuvers with RRTStanley Bak, Johannes Betz, Abhinav Chawla et al.
High-performance autonomy often must operate at the boundaries of safety. When external agents are present in a system, the process of ensuring safety without sacrificing performance becomes extremely difficult. In this paper, we present an approach to stress test such systems based on the rapidly exploring random tree (RRT) algorithm. We propose to find faults in such systems through adversarial agent perturbations, where the behaviors of other agents in an otherwise fixed scenario are modified. This creates a large search space of possibilities, which we explore both randomly and with a focused strategy that runs RRT in a bounded projection of the observable states that we call the objective space. The approach is applied to generate tests for evaluating overtaking logic and path planning algorithms in autonomous racing, where the vehicles are driving at high speed in an adversarial environment. We evaluate several autonomous racing path planners, finding numerous collisions during overtake maneuvers in all planners. The focused RRT search finds several times more crashes than the random strategy, and, for certain planners, tens to hundreds of times more crashes in the second half of the track.
LOAug 31, 2021
The Second International Verification of Neural Networks Competition (VNN-COMP 2021): Summary and ResultsStanley Bak, Changliu Liu, Taylor Johnson
This report summarizes the second International Verification of Neural Networks Competition (VNN-COMP 2021), held as a part of the 4th Workshop on Formal Methods for ML-Enabled Autonomous Systems that was collocated with the 33rd International Conference on Computer-Aided Verification (CAV). Twelve teams participated in this competition. The goal of the competition is to provide an objective comparison of the state-of-the-art methods in neural network verification, in terms of scalability and speed. Along this line, we used standard formats (ONNX for neural networks and VNNLIB for specifications), standard hardware (all tools are run by the organizers on AWS), and tool parameters provided by the tool authors. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this competition.
SEFeb 24, 2021
The Black-Box Simplex Architecture for Runtime Assurance of Autonomous CPSUsama Mehmood, Sanaz Sheikhi, Stanley Bak et al.
The Simplex Architecture is a runtime assurance framework where control authority may switch from an unverified and potentially unsafe advanced controller to a backup baseline controller in order to maintain the safety of an autonomous cyber-physical system. In this work, we show that runtime checks can replace the requirement to statically verify safety of the baseline controller. This is important as there are many powerful control techniques, such as model-predictive control and neural network controllers, that work well in practice but are difficult to statically verify. Since the method does not use internal information about the advanced or baseline controller, we call the approach the Black-Box Simplex Architecture. We prove the architecture is safe and present two case studies where (i) model-predictive control provides safe multi-robot coordination, and (ii) neural networks provably prevent collisions in groups of F-16 aircraft, despite the controllers occasionally outputting unsafe commands.
SYApr 12, 2020
NNV: The Neural Network Verification Tool for Deep Neural Networks and Learning-Enabled Cyber-Physical SystemsHoang-Dung Tran, Xiaodong Yang, Diego Manzanas Lopez et al.
This paper presents the Neural Network Verification (NNV) software tool, a set-based verification framework for deep neural networks (DNNs) and learning-enabled cyber-physical systems (CPS). The crux of NNV is a collection of reachability algorithms that make use of a variety of set representations, such as polyhedra, star sets, zonotopes, and abstract-domain representations. NNV supports both exact (sound and complete) and over-approximate (sound) reachability algorithms for verifying safety and robustness properties of feed-forward neural networks (FFNNs) with various activation functions. For learning-enabled CPS, such as closed-loop control systems incorporating neural networks, NNV provides exact and over-approximate reachability analysis schemes for linear plant models and FFNN controllers with piecewise-linear activation functions, such as ReLUs. For similar neural network control systems (NNCS) that instead have nonlinear plant models, NNV supports over-approximate analysis by combining the star set analysis used for FFNN controllers with zonotope-based analysis for nonlinear plant dynamics building on CORA. We evaluate NNV using two real-world case studies: the first is safety verification of ACAS Xu networks and the second deals with the safety verification of a deep learning-based adaptive cruise control system.
LGApr 12, 2020
Verification of Deep Convolutional Neural Networks Using ImageStarsHoang-Dung Tran, Stanley Bak, Weiming Xiang et al.
Convolutional Neural Networks (CNN) have redefined the state-of-the-art in many real-world applications, such as facial recognition, image classification, human pose estimation, and semantic segmentation. Despite their success, CNNs are vulnerable to adversarial attacks, where slight changes to their inputs may lead to sharp changes in their output in even well-trained networks. Set-based analysis methods can detect or prove the absence of bounded adversarial attacks, which can then be used to evaluate the effectiveness of neural network training methodology. Unfortunately, existing verification approaches have limited scalability in terms of the size of networks that can be analyzed. In this paper, we describe a set-based framework that successfully deals with real-world CNNs, such as VGG16 and VGG19, that have high accuracy on ImageNet. Our approach is based on a new set representation called the ImageStar, which enables efficient exact and over-approximative analysis of CNNs. ImageStars perform efficient set-based analysis by combining operations on concrete images with linear programming (LP). Our approach is implemented in a tool called NNV, and can verify the robustness of VGG networks with respect to a small set of input states, derived from adversarial attacks, such as the DeepFool attack. The experimental results show that our approach is less conservative and faster than existing zonotope methods, such as those used in DeepZ, and the polytope method used in DeepPoly.
SYJun 24, 2018
Cyber-Physical Specification MismatchesLuan V. Nguyen, Khaza Anuarul Hoque, Stanley Bak et al.
Embedded systems use increasingly complex software and are evolving into cyber-physical systems (CPS) with sophisticated interaction and coupling between physical and computational processes. Many CPS operate in safety-critical environments and have stringent certification, reliability, and correctness requirements. These systems undergo changes throughout their lifetimes, where either the software or physical hardware is updated in subsequent design iterations. One source of failure in safety-critical CPS is when there are unstated assumptions in either the physical or cyber parts of the system, and new components do not match those assumptions. In this work, we present an automated method towards identifying unstated assumptions in CPS. Dynamic specifications in the form of candidate invariants of both the software and physical components are identified using dynamic analysis (executing and/or simulating the system implementation or model thereof). A prototype tool called Hynger (for HYbrid iNvariant GEneratoR) was developed that instruments Simulink/Stateflow (SLSF) model diagrams to generate traces in the input format compatible with the Daikon invariant inference tool, which has been extensively applied to software systems. Hynger, in conjunction with Daikon, is able to detect candidate invariants of several CPS case studies. We use the running example of a DC-to-DC power converter, and demonstrate that Hynger can detect a specification mismatch where a tolerance assumed by the software is violated due to a plant change. Another case study of an automotive control system is also introduced to illustrate the power of Hynger and Daikon in automatically identifying cyber-physical specification mismatches.
CRFeb 26, 2012
S3A: Secure System Simplex Architecture for Enhanced Security of Cyber-Physical SystemsSibin Mohan, Stanley Bak, Emiliano Betti et al.
Until recently, cyber-physical systems, especially those with safety-critical properties that manage critical infrastructure (e.g. power generation plants, water treatment facilities, etc.) were considered to be invulnerable against software security breaches. The recently discovered 'W32.Stuxnet' worm has drastically changed this perception by demonstrating that such systems are susceptible to external attacks. Here we present an architecture that enhances the security of safety-critical cyber-physical systems despite the presence of such malware. Our architecture uses the property that control systems have deterministic execution behavior, to detect an intrusion within 0.6 μs while still guaranteeing the safety of the plant. We also show that even if an attack is successful, the overall state of the physical system will still remain safe. Even if the operating system's administrative privileges have been compromised, our architecture will still be able to protect the physical system from coming to harm.