LOJul 12, 2023
Towards a Certified Proof Checker for Deep Neural Network VerificationRemi Desmartin, Omri Isac, Grant Passmore et al.
Recent developments in deep neural networks (DNNs) have led to their adoption in safety-critical systems, which in turn has heightened the need for guaranteeing their safety. These safety properties of DNNs can be proven using tools developed by the verification community. However, these tools are themselves prone to implementation bugs and numerical stability problems, which make their reliability questionable. To overcome this, some verifiers produce proofs of their results which can be checked by a trusted checker. In this work, we present a novel implementation of a proof checker for DNN verification. It improves on existing implementations by offering numerical stability and greater verifiability. To achieve this, we leverage two key capabilities of Imandra, an industrial theorem prover: its support of infinite precision real arithmetic and its formal verification infrastructure. So far, we have implemented a proof checker in Imandra, specified its correctness properties and started to verify the checker's compliance with them. Our ongoing work focuses on completing the formal verification of the checker and further optimizing its performance.
LOMar 19, 2023
Logic of Differentiable Logics: Towards a Uniform Semantics of DLNatalia Ślusarz, Ekaterina Komendantskaya, Matthew L. Daggitt et al.
Differentiable logics (DL) have recently been proposed as a method of training neural networks to satisfy logical specifications. A DL consists of a syntax in which specifications are stated and an interpretation function that translates expressions in the syntax into loss functions. These loss functions can then be used during training with standard gradient descent algorithms. The variety of existing DLs and the differing levels of formality with which they are treated makes a systematic comparative study of their properties and implementations difficult. This paper remedies this problem by suggesting a meta-language for defining DLs that we call the Logic of Differentiable Logics, or LDL. Syntactically, it generalises the syntax of existing DLs to FOL, and for the first time introduces the formalism for reasoning about vectors and learners. Semantically, it introduces a general interpretation function that can be instantiated to define loss functions arising from different existing DLs. We use LDL to establish several theoretical properties of existing DLs, and to conduct their empirical study in neural network verification.
CLJun 21, 2022
Why Robust Natural Language Understanding is a ChallengeMarco Casadio, Ekaterina Komendantskaya, Verena Rieser et al.
With the proliferation of Deep Machine Learning into real-life applications, a particular property of this technology has been brought to attention: robustness Neural Networks notoriously present low robustness and can be highly sensitive to small input perturbations. Recently, many methods for verifying networks' general properties of robustness have been proposed, but they are mostly applied in Computer Vision. In this paper we propose a Verification specification for Natural Language Understanding classification based on larger regions of interest, and we discuss the challenges of such task. We observe that, although the data is almost linearly separable, the verifier struggles to output positive results and we explain the problems and implications.
LOJul 21, 2022
CheckINN: Wide Range Neural Network Verification in Imandra (Extended)Remi Desmartin, Grant Passmore, Ekaterina Komendantskaya et al.
Neural networks are increasingly relied upon as components of complex safety-critical systems such as autonomous vehicles. There is high demand for tools and methods that embed neural network verification in a larger verification cycle. However, neural network verification is difficult due to a wide range of verification properties of interest, each typically only amenable to verification in specialised solvers. In this paper, we show how Imandra, a functional programming language and a theorem prover originally designed for verification, validation and simulation of financial infrastructure can offer a holistic infrastructure for neural network verification. We develop a novel library CheckINN that formalises neural networks in Imandra, and covers different important facets of neural network verification.
AIJul 14, 2022
Differentiable Logics for Neural Network Training and VerificationNatalia Slusarz, Ekaterina Komendantskaya, Matthew L. Daggitt et al.
The rising popularity of neural networks (NNs) in recent years and their increasing prevalence in real-world applications have drawn attention to the importance of their verification. While verification is known to be computationally difficult theoretically, many techniques have been proposed for solving it in practice. It has been observed in the literature that by default neural networks rarely satisfy logical constraints that we want to verify. A good course of action is to train the given NN to satisfy said constraint prior to verifying them. This idea is sometimes referred to as continuous verification, referring to the loop between training and verification. Usually training with constraints is implemented by specifying a translation for a given formal logic language into loss functions. These loss functions are then used to train neural networks. Because for training purposes these functions need to be differentiable, these translations are called differentiable logics (DL). This raises several research questions. What kind of differentiable logics are possible? What difference does a specific choice of DL make in the context of continuous verification? What are the desirable criteria for a DL viewed from the point of view of the resulting loss function? In this extended abstract we will discuss and answer these questions.
LOMay 13
Quantitative Linear Logic for Neuro-Symbolic Learning and VerificationThomas Flinkow, Ekaterina Komendantskaya, Matteo Capucci et al.
Differentiable Logics are deployed in neuro-symbolic learning tasks as a way of embedding logical constraints in the training objective of neural networks. A differentiable logic consists of a syntax to write logical properties and a semantics to interpret them as real-valued functions to be folded in the loss function. A defining trade-off of the field is that between logical properties of the connectives, and analytic concerns for the semantics, with both aspects being relevant in applications. At one extreme we find fuzzy logics, that have well-established algebraic and proof-theoretic foundations, and at the other ad-hoc differentiable logics like Fischer's DL2, conceived for deep learning applications. However, no satisfactory foundation has emerged yet. We propose a resolution to this long-standing tension via a novel logic, Quantitative Linear Logic (QLL), with foundational ambitions. Our design is driven by naturality -- the idea that, since logical constraints are translated to losses, the semantics of the connectives should be pertinent operations used in ML practice (that is, sum and log-sum-exp) on additive quantities (like logits). We then judge the result on two aspects: logical adequacy -- that they satisfy most of the standard logical laws of Linear Logic; and empirical effectiveness -- test-time performance (as measured by adversarial attacks) is well-correlated to the actual verification of the logical constraints (as measured by off-the-shelf neural network verifiers), which makes QLL stand out among SoTA techniques.
LOMay 13
Quantitative Linear LogicMatteo Capucci, Robert Atkey, Charles Grellois et al.
Real-valued logics have seen a renewed interest in verification for probabilistic and quantitative systems, in particular machine learning models, where they can be used to directly integrate specifications in the training objective. To do so effectively one has to strike a balance between the logical properties of the connectives and their semantics. A major hurdle in this sense is to give ``soft'' (i.e. differentiable) semantics to additive connectives -- in linear and fuzzy logics, additives are necessarily ``hard'' lattice operations. In this paper, we solve this problem by combining an accurate analysis of the properties of sum and product on the reals with a significant revision of sequent calculus. We introduce `quantitative sequent calculi', which simultaneously generalize hypersequent calculi of fuzzy logics and deep inference, and in which validity of a proof and provability of a sequent are real-valued quantities. We present a family of calculi, pQLL, indexed by a hardness degree $p$, prove cut-elimination theorem for them, and show completeness for enriched residuated `soft' lattices. For $p = \infty$, pQLL reduces to MALL, with provability in pQLL converging to provability in MALL when $p \to \infty$.
LGMay 1, 2025Code
A General Framework for Property-Driven Machine LearningThomas Flinkow, Marco Casadio, Colin Kessler et al.
Neural networks have been shown to frequently fail to learn critical safety and correctness properties purely from data, highlighting the need for training methods that directly integrate logical specifications. While adversarial training can be used to improve robustness to small perturbations within $ε$-cubes, domains other than computer vision -- such as control systems and natural language processing -- may require more flexible input region specifications via generalised hyper-rectangles. Differentiable logics offer a way to encode arbitrary logical constraints as additional loss terms that guide the learning process towards satisfying these constraints. In this paper, we investigate how these two complementary approaches can be unified within a single framework for property-driven machine learning, as a step toward effective formal verification of neural networks. We show that well-known properties from the literature are subcases of this general approach, and we demonstrate its practical effectiveness on a case study involving a neural network controller for a drone system. Our framework is made publicly available at https://github.com/tflinkow/property-driven-ml.
PLMay 4
Compositional Neural-Cyber-Physical System Verification in the Interactive Theorem Prover of Your ChoiceMatthew L. Daggit, Alistair Sirman, Alessandro Bruni et al.
Formal verification of neuro-symbolic cyber-physical systems, such as drones, medical devices and robots, is complicated. Neural components must be trained to be optimal with respect to the available data as well as the safety specifications, and then verified using specialised solvers. Symbolic models of the "cyber" and "physical" behaviour of the system must be constructed and verified in interactive theorem provers (ITPs), often requiring mature mathematical libraries to reason about the interplay of discrete and continuous dynamics, preferably obtaining infinite time-horizon guarantees. Finally, the results of the two already challenging verification tasks need to be integrated into a single proof in a coherent and consistent way, whilst preserving deployability of the resulting model. In this paper we present a compositional methodology for constructing such proofs. The Vehicle framework provides a functional, domain-specific language for specifying, training, and verifying neural components. We extend Vehicle to allow integration with any ITP with minimal effort. First, we describe how Vehicle's standard bidirectional type checker can be reused to transpile neural specifications into an intermediate representation targeting multiple theorem provers. Second, we integrate Vehicle with Rocq, Isabelle/HOL, Agda and the industrial prover Imandra; and showcase a generic infinite time-horizon safety proof of a discrete cyber-physical system with a neural network controller in each ITP. Finally, we use the Mathematical Components libraries in Rocq to verify infinite time-horizon safety of a medical device, modelled as a continuous cyber-physical system with a neural controller. To our knowledge, this is the first result of this kind in a general purpose ITP; and a result that was only feasible thanks to the compositionality provided by Vehicle's functional interface.
AIJan 12, 2024
Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic ProgramsMatthew L. Daggitt, Wen Kokke, Robert Atkey et al.
Neuro-symbolic programs, i.e. programs containing both machine learning components and traditional symbolic code, are becoming increasingly widespread. Finding a general methodology for verifying such programs is challenging due to both the number of different tools involved and the intricate interface between the ``neural'' and ``symbolic'' program components. In this paper we present a general decomposition of the neuro-symbolic verification problem into parts, and examine the problem of the embedding gap that occurs when one tries to combine proofs about the neural and symbolic components. To address this problem we then introduce Vehicle -- standing as an abbreviation for a ``verification condition language'' -- an intermediate programming language interface between machine learning frameworks, automated theorem provers, and dependently-typed formalisations of neuro-symbolic programs. Vehicle allows users to specify the properties of the neural components of neuro-symbolic programs once, and then safely compile the specification to each interface using a tailored typing and compilation procedure. We give a high-level overview of Vehicle's overall design, its interfaces and compilation & type-checking procedures, and then demonstrate its utility by formally verifying the safety of a simple autonomous car controlled by a neural network, operating in a stochastic environment with imperfect information.
CLMar 15, 2024
NLP Verification: Towards a General Methodology for Certifying RobustnessMarco Casadio, Tanvi Dinkar, Ekaterina Komendantskaya et al.
Machine Learning (ML) has exhibited substantial success in the field of Natural Language Processing (NLP). For example large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Computer Vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the literature. While presenting sophisticated algorithms, these papers have not yet crystallised into a common methodology. They are often light on the pragmatical issues of NLP verification and the area remains fragmented. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we propose a general methodology to analyse the effect of the embedding gap, a problem that refers to the discrepancy between verification of geometric subspaces and the semantic meaning of sentences, which the geometric subspaces are supposed to represent. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap. Secondly, we give a general method for training and verification of neural networks that leverages a more precise geometric estimation of semantic similarity of sentences in the embedding space and helps to overcome the effects of the embedding gap in practice.
PLJan 10, 2025
Neural Network Verification is a Programming Language ChallengeLucas C. Cordeiro, Matthew L. Daggitt, Julien Girard-Satabin et al.
Neural network verification is a new and rapidly developing field of research. So far, the main priority has been establishing efficient verification algorithms and tools, while proper support from the programming language perspective has been considered secondary or unimportant. Yet, there is mounting evidence that insights from the programming language community may make a difference in the future development of this domain. In this paper, we formulate neural network verification challenges as programming language challenges and suggest possible future solutions.
LOMay 17, 2024
A Certified Proof Checker for Deep Neural Network Verification in ImandraRemi Desmartin, Omri Isac, Grant Passmore et al.
Recent advances in the verification of deep neural networks (DNNs) have opened the way for a broader usage of DNN verification technology in many application areas, including safety-critical ones. However, DNN verifiers are themselves complex programs that have been shown to be susceptible to errors and numerical imprecision; this, in turn, has raised the question of trust in DNN verifiers. One prominent attempt to address this issue is enhancing DNN verifiers with the capability of producing certificates of their results that are subject to independent algorithmic checking. While formulations of Marabou certificate checking already exist on top of the state-of-the-art DNN verifier Marabou, they are implemented in C++, and that code itself raises the question of trust (e.g., in the precision of floating point calculations or guarantees for implementation soundness). Here, we present an alternative implementation of the Marabou certificate checking in Imandra -- an industrial functional programming language and an interactive theorem prover (ITP) -- that allows us to obtain full proof of certificate correctness. The significance of the result is two-fold. Firstly, it gives stronger independent guarantees for Marabou proofs. Secondly, it opens the way for the wider adoption of DNN verifiers in interactive theorem proving in the same way as many ITPs already incorporate SMT solvers.
PLApr 16, 2025
Proof-Carrying Neuro-Symbolic CodeEkaterina Komendantskaya
This invited paper introduces the concept of "proof-carrying neuro-symbolic code" and explains its meaning and value, from both the "neural" and the "symbolic" perspectives. The talk outlines the first successes and challenges that this new area of research faces.
ROMay 1, 2025
Neural Network Verification for Gliding Drone Control: A Case StudyColin Kessler, Ekaterina Komendantskaya, Marco Casadio et al.
As machine learning is increasingly deployed in autonomous systems, verification of neural network controllers is becoming an active research domain. Existing tools and annual verification competitions suggest that soon this technology will become effective for real-world applications. Our application comes from the emerging field of microflyers that are passively transported by the wind, which may have various uses in weather or pollution monitoring. Specifically, we investigate centimetre-scale bio-inspired gliding drones that resemble Alsomitra macrocarpa diaspores. In this paper, we propose a new case study on verifying Alsomitra-inspired drones with neural network controllers, with the aim of adhering closely to a target trajectory. We show that our system differs substantially from existing VNN and ARCH competition benchmarks, and show that a combination of tools holds promise for verifying such systems in the future, if certain shortcomings can be overcome. We propose a novel method for robust training of regression networks, and investigate formalisations of this case study in Vehicle and CORA. Our verification results suggest that the investigated training methods do improve performance and robustness of neural network controllers in this application, but are limited in scope and usefulness. This is due to systematic limitations of both Vehicle and CORA, and the complexity of our system reducing the scale of reachability, which we investigate in detail. If these limitations can be overcome, it will enable engineers to develop safe and robust technologies that improve people's lives and reduce our impact on the environment.
CVMay 24, 2024
A Neurosymbolic Framework for Bias Correction in Convolutional Neural NetworksParth Padalkar, Natalia Ślusarz, Ekaterina Komendantskaya et al.
Recent efforts in interpreting Convolutional Neural Networks (CNNs) focus on translating the activation of CNN filters into a stratified Answer Set Program (ASP) rule-sets. The CNN filters are known to capture high-level image concepts, thus the predicates in the rule-set are mapped to the concept that their corresponding filter represents. Hence, the rule-set exemplifies the decision-making process of the CNN w.r.t the concepts that it learns for any image classification task. These rule-sets help understand the biases in CNNs, although correcting the biases remains a challenge. We introduce a neurosymbolic framework called NeSyBiCor for bias correction in a trained CNN. Given symbolic concepts, as ASP constraints, that the CNN is biased towards, we convert the concepts to their corresponding vector representations. Then, the CNN is retrained using our novel semantic similarity loss that pushes the filters away from (or towards) learning the desired/undesired concepts. The final ASP rule-set obtained after retraining, satisfies the constraints to a high degree, thus showing the revision in the knowledge of the CNN. We demonstrate that our NeSyBiCor framework successfully corrects the biases of CNNs trained with subsets of classes from the "Places" dataset while sacrificing minimal accuracy and improving interpretability.
AIJan 25, 2024
Marabou 2.0: A Versatile Formal Analyzer of Neural NetworksHaoze Wu, Omri Isac, Aleksandar Zeljić et al.
This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release.
CLMay 6, 2023
ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for VerificationMarco Casadio, Luca Arnaboldi, Matthew L. Daggitt et al.
Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP datasets and models in a way that renders them amenable to known verification methods based on abstract interpretation. We implement these methods as a Python library called ANTONIO that links to the neural network verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP applications. We hope that, thanks to its general applicability, this work will open novel possibilities for including NLP verification problems into neural network verification competitions, and will popularise NLP problems within this community.
AIMay 24, 2021
Actions You Can Handle: Dependent Types for AI PlansAlasdair Hill, Ekaterina Komendantskaya, Matthew L. Daggitt et al.
Verification of AI is a challenge that has engineering, algorithmic and programming language components. For example, AI planners are deployed to model actions of autonomous agents. They comprise a number of searching algorithms that, given a set of specified properties, find a sequence of actions that satisfy these properties. Although AI planners are mature tools from the algorithmic and engineering points of view, they have limitations as programming languages. Decidable and efficient automated search entails restrictions on the syntax of the language, prohibiting use of higher-order properties or recursion. This paper proposes a methodology for embedding plans produced by AI planners into dependently-typed language Agda, which enables users to reason about and verify more general and abstract properties of plans, and also provides a more holistic programming language infrastructure for modelling plan execution.
LGApr 3, 2021
Neural Network Robustness as a Verification Property: A Principled Case StudyMarco Casadio, Ekaterina Komendantskaya, Matthew L. Daggitt et al.
Neural networks are very successful at detecting patterns in noisy data, and have become the technology of choice in many fields. However, their usefulness is hampered by their susceptibility to adversarial attacks. Recently, many methods for measuring and improving a network's robustness to adversarial perturbations have been proposed, and this growing body of research has given rise to numerous explicit or implicit notions of robustness. Connections between these notions are often subtle, and a systematic comparison between them is missing in the literature. In this paper we begin addressing this gap, by setting up general principles for the empirical analysis and evaluation of a network's robustness as a mathematical property - during the network's training phase, its verification, and after its deployment. We then apply these principles and conduct a case study that showcases the practical benefits of our general approach.
LOAug 10, 2020
Proof-Carrying Plans: a Resource Logic for AI PlanningAlasdair Hill, Ekaterina Komendantskaya, Ronald P. A. Petrick
Recent trends in AI verification and Explainable AI have raised the question of whether AI planning techniques can be verified. In this paper, we present a novel resource logic, the Proof Carrying Plans (PCP) logic that can be used to verify plans produced by AI planners. The PCP logic takes inspiration from existing resource logics (such as Linear logic and Separation logic) as well as Hoare logic when it comes to modelling states and resource-aware plan execution. It also capitalises on the Curry-Howard approach to logics, in its treatment of plans as functions and plan pre- and post-conditions as types. This paper presents two main results. From the theoretical perspective, we show that the PCP logic is sound relative to the standard possible world semantics used in AI planning. From the practical perspective, we present a complete Agda formalisation of the PCP logic and of its soundness proof. Moreover, we showcase the Curry-Howard, or functional, value of this implementation by supplementing it with the library that parses AI plans into Agda's proofs automatically. We provide evaluation of this library and the resulting Agda functions.
AIJul 2, 2019
Neural Network Verification for the Masses (of AI graduates)Ekaterina Komendantskaya, Rob Stewart, Kirsy Duncan et al.
Rapid development of AI applications has stimulated demand for, and has given rise to, the rapidly growing number and diversity of AI MSc degrees. AI and Robotics research communities, industries and students are becoming increasingly aware of the problems caused by unsafe or insecure AI applications. Among them, perhaps the most famous example is vulnerability of deep neural networks to ``adversarial attacks''. Owing to wide-spread use of neural networks in all areas of AI, this problem is seen as particularly acute and pervasive. Despite of the growing number of research papers about safety and security vulnerabilities of AI applications, there is a noticeable shortage of accessible tools, methods and teaching materials for incorporating verification into AI programs. LAIV -- the Lab for AI and Verification -- is a newly opened research lab at Heriot-Watt university that engages AI and Robotics MSc students in verification projects, as part of their MSc dissertation work. In this paper, we will report on successes and unexpected difficulties LAIV faces, many of which arise from limitations of existing programming languages used for verification. We will discuss future directions for incorporating verification into AI degrees.
SEMar 5, 2013
Statistical Proof Pattern Recognition: Automated or Interactive?Jónathan Heras, Ekaterina Komendantskaya
In this paper, we compare different existing approaches employed in data mining of big proof libraries in automated and interactive theorem proving.
LOFeb 26, 2013
ML4PG in Computer Algebra verificationJónathan Heras, Ekaterina Komendantskaya
ML4PG is a machine-learning extension that provides statistical proof hints during the process of Coq/SSReflect proof development. In this paper, we use ML4PG to find proof patterns in the CoqEAL library -- a library that was devised to verify the correctness of Computer Algebra algorithms. In particular, we use ML4PG to help us in the formalisation of an efficient algorithm to compute the inverse of triangular matrices.
AIJan 25, 2013
Recycling Proof Patterns in Coq: Case StudiesJónathan Heras, Ekaterina Komendantskaya
Development of Interactive Theorem Provers has led to the creation of big libraries and varied infrastructures for formal proofs. However, despite (or perhaps due to) their sophistication, the re-use of libraries by non-experts or across domains is a challenge. In this paper, we provide detailed case studies and evaluate the machine-learning tool ML4PG built to interactively data-mine the electronic libraries of proofs, and to provide user guidance on the basis of proof patterns found in the existing libraries.