38.8LOApr 14
Heterogeneous Dynamic Logic: Provability Modulo Program TheoriesSamuel Teuber, Mattias Ulbrich, André Platzer et al.
Formally specifying, let alone verifying, properties of systems involving multiple programming languages is inherently challenging. We introduce Heterogeneous Dynamic Logic (HDL), a framework for combining reasoning principles from distinct (dynamic) program logics in a modular and compositional way. HDL mirrors the architecture of satisfiability modulo theories (SMT): Individual dynamic logics, along with their calculi, are treated as dynamic theories that can be combined to reason about heterogeneous systems whose components are verified using different program logics. HDL provides two key operations: Lifting extends an individual dynamic theory with new program constructs (e.g., the havoc operation or regular programs) and automatically augments its calculus with sound reasoning principles for the new constructs; and Combination enables cross-language reasoning in a single modality via Heterogeneous Dynamic Theories, facilitating the reuse of existing proof infrastructure. By lifting combined theories with regular programs, we obtain heterogeneous control structures that allow us to reason about intertwined cross-language behavior. We formalize dynamic theories, their lifting and combination, and prove the soundness of all proof rules in Isabelle. We also introduce a proof rule combining deductive DL-based reasoning with reasoning principles from Kleene Algebras with Tests. Furthermore, we prove relative completeness theorems for lifting and combination: Under usual assumptions, reasoning about lifted or combined theories is no harder than reasoning about the constituent dynamic theories and their common first-order structure (i.e., the data theory). We demonstrate HDL's value by verifying an automotive case study where a Java controller (formalized in Java dynamic logic) steers a plant model (formalized in differential dynamic logic).
SEFeb 3, 2025
Next Steps in LLM-Supported Java VerificationSamuel Teuber, Bernhard Beckert
Recent work has shown that Large Language Models (LLMs) are not only a suitable tool for code generation but also capable of generating annotation-based code specifications. Scaling these methodologies may allow us to deduce provable correctness guarantees for large-scale software systems. In comparison to other LLM tasks, the application field of deductive verification has the notable advantage of providing a rigorous toolset to check LLM-generated solutions. This short paper provides early results on how this rigorous toolset can be used to reliably elicit correct specification annotations from an unreliable LLM oracle.
LGOct 26, 2024
Revisiting Differential Verification: Equivalence Verification with ConfidenceSamuel Teuber, Philipp Kern, Marvin Janzen et al.
When validated neural networks (NNs) are pruned (and retrained) before deployment, it is desirable to prove that the new NN behaves equivalently to the (original) reference NN. To this end, our paper revisits the idea of differential verification which performs reasoning on differences between NNs: On the one hand, our paper proposes a novel abstract domain for differential verification admitting more efficient reasoning about equivalence. On the other hand, we investigate empirically and theoretically which equivalence properties are (not) efficiently solved using differential reasoning. Based on the gained insights, and following a recent line of work on confidence-based verification, we propose a novel equivalence property that is amenable to Differential Verification while providing guarantees for large parts of the input space instead of small-scale guarantees constructed w.r.t. predetermined input points. We implement our approach in a new tool called VeryDiff and perform an extensive evaluation on numerous old and new benchmark families, including new pruned NNs for particle jet classification in the context of CERN's LHC where we observe median speedups >300x over the State-of-the-Art verifier alpha,beta-CROWN.
SYJul 30, 2025
Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite PrecisionSamuel Teuber, Debasmita Lohar, Bernhard Beckert
As neural networks (NNs) become increasingly prevalent in safety-critical neural network-controlled cyber-physical systems (NNCSs), formally guaranteeing their safety becomes crucial. For these systems, safety must be ensured throughout their entire operation, necessitating infinite-time horizon verification. To verify the infinite-time horizon safety of NNCSs, recent approaches leverage Differential Dynamic Logic (dL). However, these dL-based guarantees rely on idealized, real-valued NN semantics and fail to account for roundoff errors introduced by finite-precision implementations. This paper bridges the gap between theoretical guarantees and real-world implementations by incorporating robustness under finite-precision perturbations -- in sensing, actuation, and computation -- into the safety verification. We model the problem as a hybrid game between a good Demon, responsible for control actions, and a bad Angel, introducing perturbations. This formulation enables formal proofs of robustness w.r.t. a given (bounded) perturbation. Leveraging this bound, we employ state-of-the-art mixed-precision fixed-point tuners to synthesize sound and efficient implementations, thus providing a complete end-to-end solution. We evaluate our approach on case studies from the automotive and aeronautics domains, producing efficient NN implementations with rigorous infinite-time horizon safety guarantees.
CRDec 15, 2023
An Information-Flow Perspective on Algorithmic FairnessSamuel Teuber, Bernhard Beckert
This work presents insights gained by investigating the relationship between algorithmic fairness and the concept of secure information flow. The problem of enforcing secure information flow is well-studied in the context of information security: If secret information may "flow" through an algorithm or program in such a way that it can influence the program's output, then that is considered insecure information flow as attackers could potentially observe (parts of) the secret. There is a strong correspondence between secure information flow and algorithmic fairness: if protected attributes such as race, gender, or age are treated as secret program inputs, then secure information flow means that these ``secret'' attributes cannot influence the result of a program. While most research in algorithmic fairness evaluation concentrates on studying the impact of algorithms (often treating the algorithm as a black-box), the concepts derived from information flow can be used both for the analysis of disparate treatment as well as disparate impact w.r.t. a structural causal model. In this paper, we examine the relationship between quantitative as well as qualitative information-flow properties and fairness. Moreover, based on this duality, we derive a new quantitative notion of fairness called fairness spread, which can be easily analyzed using quantitative information flow and which strongly relates to counterfactual fairness. We demonstrate that off-the-shelf tools for information-flow properties can be used in order to formally analyze a program's algorithmic fairness properties, including the new notion of fairness spread as well as established notions such as demographic parity.
LOOct 20, 2019
Relational Test Tables: A Practical Specification Language for Evolution and SecurityAlexander Weigl, Mattias Ulbrich, Suhyun Cha et al.
A wide range of interesting program properties are intrinsically relational, i.e., they relate two or more program traces. Two prominent relational properties are secure information flow and conditional program equivalence. By showing the absence of illegal information flow, confidentiality and integrity properties can be proved. Equivalence proofs allow using an existing (trusted) software release as specification for new revisions. Currently, the verification of relational properties is hardly accessible to practitioners due to the lack of appropriate relational specification languages. In previous work, we introduced the concept of generalised test tables: a table-based specification language for functional (non-relational) properties of reactive systems. In this paper, we present relational test tables -- a canonical extension of generalised test tables for the specification of relational properties, which refer to two or more program runs or traces. Regression test tables support asynchronous program runs via stuttering. We show the applicability of relational test tables, using them for the specification and verification of two examples from the domain of automated product systems.
SEJul 9, 2019
Understanding Counterexamples for Relational Properties with DIbuggerMihai Herda, Michael Kirsten, Etienne Brunner et al.
Software verification is a tedious process that involves the analysis of multiple failed verification attempts, and adjustments of the program or specification. This is especially the case for complex requirements, e.g., regarding security or fairness, when one needs to compare multiple related runs of the same software. Verification tools often provide counterexamples consisting of program inputs when a proof attempt fails, however it is often not clear why the reported counterexample leads to a violation of the checked property. In this paper, we enhance this aspect of the software verification process by providing DIbugger, a tool for analyzing counterexamples of relational properties, allowing the user to debug multiple related programs simultaneously.
SEFeb 7, 2018
Experience Report: Formal Methods in Material ScienceBernhard Beckert, Britta Nestler, Moritz Kiefer et al.
Increased demands in the field of scientific computation require that algorithms be more efficiently implemented. Maintaining correctness in addition to efficiency is a challenge that software engineers in the field have to face. In this report we share our first impressions and experiences on the applicability of formal methods to such design challenges arising in the development of scientific computation software in the field of material science. We investigated two different algorithms, one for load distribution and one for the computation of convex hulls, and demonstrate how formal methods have been used to discover counterexamples to the correctness of the existing implementations as well as proving the correctness of a revised algorithm. The techniques employed for this include SMT solvers, and automatic and interactive verification tools.
LOJan 26, 2018
Relational Equivalence Proofs Between Imperative and MapReduce AlgorithmsBernhard Beckert, Timo Bingmann, Moritz Kiefer et al.
MapReduce frameworks are widely used for the implementation of distributed algorithms. However, translating imperative algorithms into these frameworks requires significant structural changes to the algorithm. As the costs of running faulty algorithms at scale can be severe, it is highly desirable to verify the correctness of the translation, i.e., to prove that the MapReduce version is equivalent to the imperative original. We present a novel approach for proving equivalence between imperative and MapReduce algorithms based on partitioning the equivalence proof into a sequence of equivalence proofs between intermediate programs with smaller differences. Our approach is based on the insight that two kinds of sub-proofs are required: (1) uniform transformations changing the controlflow structure that are mostly independent of the particular context in which they are applied; and (2) context-dependent transformations that are not uniform but that preserve the overall structure and can be proved correct using coupling invariants. We demonstrate the feasibility of our approach by evaluating it on two prototypical algorithms commonly used as examples in MapReduce frameworks: k-means and PageRank. To carry out the proofs, we use the interactive theorem prover Coq with partial proof automation. The results show that our approach and its prototypical implementation based on Coq enables equivalence proofs of non-trivial algorithms and could be automated to a large degree.
LOOct 30, 2014
How to Put Usability into Focus: Using Focus Groups to Evaluate the Usability of Interactive Theorem ProversBernhard Beckert, Sarah Grebing, Florian Böhl
In recent years the effectiveness of interactive theorem provers has increased to an extent that the bottleneck in the interactive process shifted to efficiency: while in principle large and complex theorems are provable (effectiveness), it takes a lot of effort for the user interacting with the system (lack of efficiency). We conducted focus groups to evaluate the usability of Isabelle/HOL and the KeY system with two goals: (a) detect usability issues in the interaction between interactive theorem provers and their user, and (b) analyze how evaluation and survey methods commonly used in the area of human-computer interaction, such as focus groups and co-operative evaluation, are applicable to the specific field of interactive theorem proving (ITP). In this paper, we report on our experience using the evaluation method focus groups and how we adapted this method to ITP. We describe our results and conclusions mainly on the "meta-level," i.e., we focus on the impact that specific characteristics of ITPs have on the setup and the results of focus groups. On the concrete level, we briefly summarise insights into the usability of the ITPs used in our case study.