74.8GTMay 13
Dicey Games: Shared Sources of Randomness in Distributed SystemsLéonard Brice, Thomas A. Henzinger, K. S. Thejaswini
Consider a 4-player version of Matching Pennies where a team of three players competes against the Devil. Each player simultaneously says "Heads" or "Tails". The team wins if all four choices match; otherwise the Devil wins. If all team players randomise independently, they win with probability 1/8; if all players share a common source of randomness, they win with probability 1/2. What happens when each pair of team players shares a source of randomness? Can the team do better than win with probability 1/4? The surprising (and nontrivial) answer is yes! We introduce Dicey Games, a formal framework motivated by the study of distributed systems with shared sources of randomness (of which the above example is a specific instance). We characterise the existence, representation and computational complexity of optimal strategies in Dicey Games, and we study the problem of allocating limited sources of randomness optimally within a team.
QMMay 5, 2010
Hybrid Numerical Solution of the Chemical Master EquationThomas A. Henzinger, Maria Mateescu, Linar Mikeev et al.
We present a numerical approximation technique for the analysis of continuous-time Markov chains that describe networks of biochemical reactions and play an important role in the stochastic modeling of biological systems. Our approach is based on the construction of a stochastic hybrid model in which certain discrete random variables of the original Markov chain are approximated by continuous deterministic variables. We compute the solution of the stochastic hybrid model using a numerical algorithm that discretizes time and in each step performs a mutual update of the transient probability distribution of the discrete stochastic variables and the values of the continuous deterministic variables. We implemented the algorithm and we demonstrate its usefulness and efficiency on several case studies from systems biology.
LGOct 11, 2022
Learning Control Policies for Stochastic Systems with Reach-avoid GuaranteesĐorđe Žikelić, Mathias Lechner, Thomas A. Henzinger et al.
We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.
ROApr 15, 2022
Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot LearningMathias Lechner, Alexander Amini, Daniela Rus et al.
Adversarial training (i.e., training on adversarially perturbed input data) is a well-studied method for making neural networks robust to potential adversarial attacks during inference. However, the improved robustness does not come for free but rather is accompanied by a decrease in overall model accuracy and performance. Recent work has shown that, in practical robot learning applications, the effects of adversarial training do not pose a fair trade-off but inflict a net loss when measured in holistic robot performance. This work revisits the robustness-accuracy trade-off in robot learning by systematically analyzing if recent advances in robust training methods and theory in conjunction with adversarial robot learning, are capable of making adversarial training suitable for real-world robot applications. We evaluate three different robot learning tasks ranging from autonomous driving in a high-fidelity environment amenable to sim-to-real deployment to mobile robot navigation and gesture recognition. Our results demonstrate that, while these techniques make incremental improvements on the trade-off on a relative scale, the negative impact on the nominal accuracy caused by adversarial training still outweighs the improved robustness by an order of magnitude. We conclude that although progress is happening, further advances in robust learning methods are necessary before they can benefit robot learning tasks in practice.
LGOct 11, 2022
Learning Provably Stabilizing Neural Controllers for Discrete-Time Stochastic SystemsMatin Ansaripour, Krishnendu Chatterjee, Thomas A. Henzinger et al.
We consider the problem of learning control policies in discrete-time stochastic systems which guarantee that the system stabilizes within some specified stabilization region with probability~$1$. Our approach is based on the novel notion of stabilizing ranking supermartingales (sRSMs) that we introduce in this work. Our sRSMs overcome the limitation of methods proposed in previous works whose applicability is restricted to systems in which the stabilizing region cannot be left once entered under any control policy. We present a learning procedure that learns a control policy together with an sRSM that formally certifies probability~$1$ stability, both learned as neural networks. We show that this procedure can also be adapted to formally verifying that, under a given Lipschitz continuous control policy, the stochastic system stabilizes within some stabilizing region with probability~$1$. Our experimental evaluation shows that our learning procedure can successfully learn provably stabilizing policies in practice.
CVOct 9, 2022
Are All Vision Models Created Equal? A Study of the Open-Loop to Closed-Loop Causality GapMathias Lechner, Ramin Hasani, Alexander Amini et al.
There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control from visual observations. These advanced deep models, ranging from convolutional to patch-based networks, have been extensively tested on offline image classification and regression tasks. In this paper, we study these vision architectures with respect to the open-loop to closed-loop causality gap, i.e., offline training followed by an online closed-loop deployment. This causality gap typically emerges in robotics applications such as autonomous driving, where a network is trained to imitate the control commands of a human. In this setting, two situations arise: 1) Closed-loop testing in-distribution, where the test environment shares properties with those of offline training data. 2) Closed-loop testing under distribution shifts and out-of-distribution. Contrary to recently reported results, we show that under proper training guidelines, all vision models perform indistinguishably well on in-distribution deployment, resolving the causality gap. In situation 2, We observe that the causality gap disrupts performance regardless of the choice of the model architecture. Our results imply that the causality gap can be solved in situation one with our proposed training guideline with any modern network architecture, whereas achieving out-of-distribution generalization (situation two) requires further investigations, for instance, on data diversity rather than the model architecture.
AIAug 1, 2023
Monitoring Algorithmic Fairness under Partial ObservationsThomas A. Henzinger, Konstantin Kueffner, Kaushik Mallik
As AI and machine-learned software are used increasingly for making decisions that affect humans, it is imperative that they remain fair and unbiased in their decisions. To complement design-time bias mitigation measures, runtime verification techniques have been introduced recently to monitor the algorithmic fairness of deployed systems. Previous monitoring techniques assume full observability of the states of the (unknown) monitored system. Moreover, they can monitor only fairness properties that are specified as arithmetic expressions over the probabilities of different events. In this work, we extend fairness monitoring to systems modeled as partially observed Markov chains (POMC), and to specifications containing arithmetic expressions over the expected values of numerical functions on event sequences. The only assumptions we make are that the underlying POMC is aperiodic and starts in the stationary distribution, with a bound on its mixing time being known. These assumptions enable us to estimate a given property for the entire distribution of possible executions of the monitored POMC, by observing only a single execution. Our monitors observe a long run of the system and, after each new observation, output updated PAC-estimates of how fair or biased the system is. The monitors are computationally lightweight and, using a prototype implementation, we demonstrate their effectiveness on several real-world examples.
LGJun 2, 2022
Entangled Residual MappingsMathias Lechner, Ramin Hasani, Zahra Babaiee et al.
Residual mappings have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual mappings to generalize the structure of the residual connections and evaluate their role in iterative learning representations. An entangled residual mapping replaces the identity skip connections with specialized entangled mappings such as orthogonal, sparse, and structural correlation matrices that share key attributes (eigenvalues, structure, and Jacobian norm) with identity mappings. We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks differently than attention-based models and recurrent neural networks. In general, we find that for CNNs and Vision Transformers entangled sparse mapping can help generalization while orthogonal mappings hurt performance. For recurrent networks, orthogonal residual mappings form an inductive bias for time-variant sequences, which degrades accuracy on time-invariant tasks.
LGNov 29, 2022
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural NetworksMathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee et al.
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs). Quantization is a technique for making neural networks more efficient by running them using low-bit integer arithmetic and is therefore commonly adopted in industry. Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization, and certification of the quantized representation is necessary to guarantee robustness. In this work, we present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs. Inspired by advances in robust learning of non-quantized networks, our training algorithm computes the gradient of an abstract representation of the actual network. Unlike existing approaches, our method can handle the discrete semantics of QNNs. Based on QA-IBP, we also develop a complete verification procedure for verifying the adversarial robustness of QNNs, which is guaranteed to terminate and produce a correct answer. Compared to existing approaches, the key advantage of our verification procedure is that it runs entirely on GPU or other accelerator devices. We demonstrate experimentally that our approach significantly outperforms existing methods and establish the new state-of-the-art for training and certifying the robustness of QNNs.
LGMay 24, 2022
Learning Stabilizing Policies in Stochastic Control SystemsĐorđe Žikelić, Mathias Lechner, Krishnendu Chatterjee et al.
In this work, we address the problem of learning provably stable neural network policies for stochastic control systems. While recent work has demonstrated the feasibility of certifying given policies using martingale theory, the problem of how to learn such policies is little explored. Here, we study the effectiveness of jointly learning a policy together with a martingale certificate that proves its stability using a single learning algorithm. We observe that the joint optimization problem becomes easily stuck in local minima when starting from a randomly initialized policy. Our results suggest that some form of pre-training of the policy is required for the joint optimization to repair and verify the policy successfully.
FLJul 13, 2022
Synthesis of Parametric Hybrid Automata from Time SeriesMiriam García Soto, Thomas A. Henzinger, Christian Schilling
We propose an algorithmic approach for synthesizing linear hybrid automata from time-series data. Unlike existing approaches, our approach provides a whole family of models. Each model in the family is guaranteed to capture the input data up to a precision error ε, in the following sense: For each time series, the model contains an execution that is ε-close to the data points. Our construction allows to effectively choose a model from this family with minimal precision error ε. We demonstrate the algorithm's efficiency and its ability to find precise models in two case studies.
OCMay 4, 2016
Invariant Clusters for Hybrid SystemsHui Kong, Sergiy Bogomolov, Christian Schilling et al.
In this paper, we propose an approach to automatically compute invariant clusters for semialgebraic hybrid systems. An invariant cluster for an ordinary differential equation (ODE) is a multivariate polynomial invariant g(u,x)=0, parametric in u, which can yield an infinite number of concrete invariants by assigning different values to u so that every trajectory of the system can be overapproximated precisely by a union of concrete invariants. For semialgebraic systems, which involve ODEs with multivariate polynomial vector flow, invariant clusters can be obtained by first computing the remainder of the Lie derivative of a template multivariate polynomial w.r.t. its Groebner basis and then solving the system of polynomial equations obtained from the coefficients of the remainder. Based on invariant clusters and sum-of-squares (SOS) programming, we present a new method for the safety verification of hybrid systems. Experiments on nonlinear benchmark systems from biology and control theory show that our approach is effective and efficient.
FLFeb 18, 2025
Resolving Nondeterminism with RandomnessThomas A. Henzinger, Keya Prakash, K. S. Thejaswini
In automata theory, while determinisation provides a standard route to solving many common problems in automata theory, some weak forms of nondeterminism can be dealt with in some problems without costly determinisation. For example, the handling of specifications given by nondeterministic automata over infinite words for the problems of reactive synthesis or runtime verification requires resolving nondeterministic choices without knowing the future of the input word. We define and study classes of $ω$-regular automata for which the nondeterminism can be resolved by a policy that uses a combination of memory and randomness on any input word, based solely on the prefix read so far. We examine two settings for providing the input word to an automaton. In the first setting, called adversarial resolvability, the input word is constructed letter-by-letter by an adversary, dependent on the resolver's previous decisions. In the second setting, called stochastic resolvability, the adversary pre-commits to an infinite word and reveals it letter-by-letter. In each setting, we require the existence of an almost-sure resolver, i.e., a policy that ensures that as long as the adversary provides a word in the language of the underlying nondeterministic automaton, the run constructed by the policy is accepting with probability 1. The class of automata that are adversarially resolvable is the well-studied class of history-deterministic automata. The case of stochastically resolvable automata, on the other hand, defines a novel class. Restricting the class of resolvers in both settings to stochastic policies without memory introduces two additional new classes of automata. We show that the new automaton classes offer interesting trade-offs between succinctness, expressivity, and computational complexity, providing a fine gradation between deterministic automata and nondeterministic automata.
FLAug 13, 2012
Finite Automata with Time-Delay Blocks (Extended Version)Krishnendu Chatterjee, Thomas A. Henzinger, Vinayak S. Prabhu
The notion of delays arises naturally in many computational models, such as, in the design of circuits, control systems, and dataflow languages. In this work, we introduce \emph{automata with delay blocks} (ADBs), extending finite state automata with variable time delay blocks, for deferring individual transition output symbols, in a discrete-time setting. We show that the ADB languages strictly subsume the regular languages, and are incomparable in expressive power to the context-free languages. We show that ADBs are closed under union, concatenation and Kleene star, and under intersection with regular languages, but not closed under complementation and intersection with other ADB languages. We show that the emptiness and the membership problems are decidable in polynomial time for ADBs, whereas the universality problem is undecidable. Finally we consider the linear-time model checking problem, i.e., whether the language of an ADB is contained in a regular language, and show that the model checking problem is PSPACE-complete.
FLJul 11, 2024
History-Determinism vs Fair SimulationUdi Boker, Thomas A. Henzinger, Karoliina Lehtinen et al.
An automaton is history-deterministic if its nondeterminism can be resolved on the fly, only using the prefix of the word read so far. This mild form of nondeterminism has attracted particular attention for its applications in synthesis problems. An automaton $A$ is guidable with respect to a class $C$ of automata if it can fairly simulate every automaton in $C$ whose language is contained in that of $A$. In other words, guidable automata are those for which inclusion and simulation coincide, making them particularly interesting for model-checking. We study the connection between these two notions, and specifically the question of when they coincide. For classes of automata on which they do, deciding guidability, an otherwise challenging decision problem, reduces to deciding history-determinism, a problem that is starting to be well-understood for many classes. We provide a selection of sufficient criteria for a class of automata to guarantee the coincidence of the notions, and use them to show that the notions coincide for the most common automata classes, among which are $ω$-regular automata and many infinite-state automata with safety and reachability acceptance conditions, including vector addition systems with states, one-counter nets, pushdown-, Parikh-, and timed-automata. We also demonstrate that history-determinism and guidability do not always coincide, for example, for the classes of timed automata with a fixed number of clocks.
23.5FLMar 30
Quantitative Language AutomataThomas A. Henzinger, Pavol Kebis, Nicolas Mazzocchi et al.
A quantitative word automaton (QWA) defines a function from infinite words to values. For example, every infinite run of a limit-average QWA A obtains a mean payoff, and every word w is assigned the maximal mean payoff obtained by nondeterministic runs of A over w. We introduce quantitative language automata (QLAs) that define functions from language generators (i.e., implementations) to values, where a language generator can be nonprobabilistic, defining a set of infinite words, or probabilistic, defining a probability measure over infinite words. A QLA consists of a QWA and a language aggregator. For example, given a QWA A, the infimum aggregator maps each language L to the greatest lower bound assigned by A to any word in L. For boolean value sets, QWAs capture trace properties, and QLAs capture hyperproperties. For more general value sets, QLAs serve as a specification language for a generalization of hyperproperties, called quantitative hyperproperties. A nonprobabilistic (resp. probabilistic) quantitative hyperproperty assigns a value to each set (resp. distribution) G of traces, e.g., the minimal (resp. expected) average response time exhibited by the traces in G (resp. by traces sampled according to G). We give several examples of quantitative hyperproperties and investigate three paradigmatic problems for QLAs: evaluation, nonemptiness, and universality. In the evaluation problem, given a QLA AA and an implementation G, we ask for the value that AA assigns to G. In the nonemptiness (resp. universality) problem, given a QLA AA, a threshold k, and a comparison in {>, >=} we ask whether AA assigns a value meeting the threshold to some (resp. every) language. We provide a comprehensive picture of decidability and complexity for these problems for QLAs with common aggregators as well as their restrictions to omega-regular languages and distributions generated by finite Markov chains.
49.0CRMar 20
Sharing The Secret: Distributed Privacy-Preserving MonitoringMahyar Karimi, K. S. Thejaswini, Roderick Bloem et al.
In traditional runtime verification, a system is typically observed by a monolithic monitor. Enforcing privacy in such settings is computationally expensive, as it necessitates heavy cryptographic primitives. Therefore, privacy-preserving monitoring remains impractical for real-time applications. In this work, we address this scalability challenge by distributing the monitor across multiple parties -- at least one of which is honest. This architecture enables the use of efficient secret-sharing schemes instead of computationally intensive cryptography, dramatically reducing over-head while maintaining strong privacy guarantees. While existing secret-sharing approaches are typically limited to one-shot executions which do not maintain an internal state, we introduce a protocol tailored for continuous monitoring that supports repeated evaluations over an evolving internal state (kept secret from the system and the monitoring entities). We implement our approach using the MP-SPDZ framework. Our experiments demonstrate that, under these architectural assumptions, our protocol is significantly more scalable than existing alternatives.
28.2AIMay 24
Energy Shields for FairnessFilip Cano, Thomas A. Henzinger, Konstantin Kueffner
Runtime fairness is not a one-time constraint but a dynamic property evaluated over a sequence of decisions. To ensure fairness at runtime, it is necessary to account for past decisions, information neglected by conventional, static classifiers. Traditional fairness shields enforce runtime fairness abruptly, by intervening \emph{deterministically} whenever a sequence of decisions violates the target for a running fairness measure. This motivates our \emph{main conceptual contribution: \textbf{energy shields}.} An energy shield is a novel, lightweight, adaptive controller that monitors a sequence of decisions and intervenes \emph{probabilistically} to ensure runtime fairness smoothly, by utilizing physics-inspired energy functions to nudge the sequence toward fairness: the more unfair the decisions, the stronger the nudging force becomes. This makes energy shields the \emph{\textbf{first}} fairness shields to provide both \emph{short-term safety and long-term liveness guarantees}. Safety ensures that the running fairness measure stays within a running target interval with high probability, and liveness ensures that the limit of the fairness measure lies within the limit target interval. Intuitively, the short-term specifies the tolerated fairness values and the long-term specifies the desired fairness values. We also provide a synthesis procedure for constructing the least intrusive energy shield for a given target specification, and demonstrate its efficiency experimentally. We evaluate our energy shields against existing fairness shields through the lens of short- and long-term fairness.
39.1FLMay 13
Decoupled Planning for Multiple Omega-Regular ObjectivesGuy Avni, Thomas A. Henzinger, Kaushik Mallik et al.
We study the problem of generating paths on a graph that satisfy a collection of ω-regular objectives. We propose a decoupled framework in which each objective is assigned to an independent agent that selects a local policy, while a scheduler -- oblivious to the graph and objective -- dynamically composes these policies into a single path. We ask when such a composition satisfies all objectives, assuming their conjunction is realizable. The framework enables modular policy design but raises fundamental compositional challenges. We show that even extremely fair deterministic schedulers do not ensure correctness, and that stochastic schedulers, while necessary, are insufficient without coordination. For safety objectives, we demonstrate that fully decentralized implementations are impossible, and we introduce a protocol for synchronizing on maximal safe actions. For non-safety objectives, we introduce conventions -- simple, a priori restrictions agreed upon before the graph or objectives are revealed -- that guarantee satisfaction of all objectives when followed by all agents. We characterize minimally restrictive conventions for major subclasses of ω-regular objectives. In particular, Büchi objectives admit universal composition of finite-memory policies without scheduler communication; co-Büchi objectives require only knowledge of whether the agent was scheduled; and parity objectives additionally require knowledge of which agent was scheduled.
13.0FLMay 12
Extending QuAK with Nested Quantitative AutomataThomas A. Henzinger, Nicolas Mazzocchi, N. Ege Saraç et al.
Quantitative automata (QAs) extend finite-state omega-automata with weighted transitions to specify quantitative system properties. However, their finite weight sets rule out properties like average response time, where response times can be arbitrarily large. Nested quantitative automata (NQAs) overcome this limitation: a parent automaton spawns child automata to compute unbounded values over finite infixes and aggregates them into a final result. Despite this expressiveness, NQAs have lacked practical tool support to date. We address this gap by extending the Quantitative Automata Kit (QuAK), a software tool for QA analysis, to support NQAs. Our core contribution is implementing a suite of flattening procedures that reduce NQAs to QAs, leveraging QuAK's existing decision procedures. These reductions preserve the answers to threshold decision problems, while allowing users to specify properties in the more expressive NQA formalism. The tool handles all combinations of parent aggregators, including limits and averages, and child functions, extrema and monotonic or bounded summations, for which emptiness and universality are known to be decidable. Experiments on response-time and resource-consumption benchmarks demonstrate QuAK's effectiveness.
13.5AIMay 8
Multi-Environment POMDPs with Finite-Horizon ObjectivesLéonard Brice, Filip Cano, Krishnendu Chatterjee et al.
Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1) we establish that it is also PSPACE-complete in the more general setting of MEPOMDPs; (2) we present a practical algorithm and evaluate it on classical benchmarks, significantly outperforming the only previously known algorithm.
AIDec 16, 2024
Fairness Shields: Safeguarding against Biased Decision MakersFilip Cano, Thomas A. Henzinger, Bettina Könighofer et al.
As AI-based decision-makers increasingly influence human lives, it is a growing concern that their decisions are often unfair or biased with respect to people's sensitive attributes, such as gender and race. Most existing bias prevention measures provide probabilistic fairness guarantees in the long run, and it is possible that the decisions are biased on specific instances of short decision sequences. We introduce fairness shielding, where a symbolic decision-maker -- the fairness shield -- continuously monitors the sequence of decisions of another deployed black-box decision-maker, and makes interventions so that a given fairness criterion is met while the total intervention costs are minimized. We present four different algorithms for computing fairness shields, among which one guarantees fairness over fixed horizons, and three guarantee fairness periodically after fixed intervals. Given a distribution over future decisions and their intervention costs, our algorithms solve different instances of bounded-horizon optimal control problems with different levels of computational costs and optimality guarantees. Our empirical evaluation demonstrates the effectiveness of these shields in ensuring fairness while maintaining cost efficiency across various scenarios.
AIMay 31, 2025
Monitoring Robustness and Individual FairnessAshutosh Gupta, Thomas A. Henzinger, Konstantin Kueffner et al.
Input-output robustness appears in various different forms in the literature, such as robustness of AI models to adversarial or semantic perturbations and individual fairness of AI models that make decisions about humans. We propose runtime monitoring of input-output robustness of deployed, black-box AI models, where the goal is to design monitors that would observe one long execution sequence of the model, and would raise an alarm whenever it is detected that two similar inputs from the past led to dissimilar outputs. This way, monitoring will complement existing offline ``robustification'' approaches to increase the trustworthiness of AI decision-makers. We show that the monitoring problem can be cast as the fixed-radius nearest neighbor (FRNN) search problem, which, despite being well-studied, lacks suitable online solutions. We present our tool Clemont, which offers a number of lightweight monitors, some of which use upgraded online variants of existing FRNN algorithms, and one uses a novel algorithm based on binary decision diagrams -- a data-structure commonly used in software and hardware verification. We have also developed an efficient parallelization technique that can substantially cut down the computation time of monitors for which the distance between input-output pairs is measured using the $L_\infty$ norm. Using standard benchmarks from the literature of adversarial and semantic robustness and individual fairness, we perform a comparative study of different monitors in \tool, and demonstrate their effectiveness in correctly detecting robustness violations at runtime.
SYDec 21, 2024
Predictive Monitoring of Black-Box Dynamical SystemsThomas A. Henzinger, Fabian Kresse, Kaushik Mallik et al.
We study the problem of predictive runtime monitoring of black-box dynamical systems with quantitative safety properties. The black-box setting stipulates that the exact semantics of the dynamical system and the controller are unknown, and that we are only able to observe the state of the controlled (aka, closed-loop) system at finitely many time points. We present a novel framework for predicting future states of the system based on the states observed in the past. The numbers of past states and of predicted future states are parameters provided by the user. Our method is based on a combination of Taylor's expansion and the backward difference operator for numerical differentiation. We also derive an upper bound on the prediction error under the assumption that the system dynamics and the controller are smooth. The predicted states are then used to predict safety violations ahead in time. Our experiments demonstrate practical applicability of our method for complex black-box systems, showing that it is computationally lightweight and yet significantly more accurate than the state-of-the-art predictive safety monitoring techniques.
LGDec 17, 2024
Neural Control and Certificate Repair via Runtime MonitoringEmily Yu, Đorđe Žikelić, Thomas A. Henzinger
Learning-based methods provide a promising approach to solving highly non-linear control tasks that are often challenging for classical control methods. To ensure the satisfaction of a safety property, learning-based methods jointly learn a control policy together with a certificate function for the property. Popular examples include barrier functions for safety and Lyapunov functions for asymptotic stability. While there has been significant progress on learning-based control with certificate functions in the white-box setting, where the correctness of the certificate function can be formally verified, there has been little work on ensuring their reliability in the black-box setting where the system dynamics are unknown. In this work, we consider the problems of certifying and repairing neural network control policies and certificate functions in the black-box setting. We propose a novel framework that utilizes runtime monitoring to detect system behaviors that violate the property of interest under some initially trained neural network policy and certificate. These violating behaviors are used to extract new training data, that is used to re-train the neural network policy and the certificate function and to ultimately repair them. We demonstrate the effectiveness of our approach empirically by using it to repair and to boost the safety rate of neural network policies learned by a state-of-the-art method for learning-based control on two autonomous system control tasks.
GTMar 7
Randomise Alone, Reach as a TeamLéonard Brice, Thomas A. Henzinger, Alipasha Montaseri et al.
We study concurrent graph games where n players cooperate against an opponent to reach a set of target states. Unlike traditional settings, we study distributed randomisation: team players do not share a source of randomness, and their private random sources are hidden from the opponent and from each other. We show that memoryless strategies are sufficient for the threshold problem (deciding whether there is a strategy for the team that ensures winning with probability that exceeds a threshold), a result that not only places the problem in the Existential Theory of the Reals (\exists\mathbb{R}) but also enables the construction of value iteration algorithms. We additionally show that the threshold problem is NP-hard. For the almost-sure reachability problem, we prove NP-completeness. We introduce Individually Randomised Alternating-time Temporal Logic (IRATL). This logic extends the standard ATL framework to reason about probability thresholds, with semantics explicitly designed for coalitions that lack a shared source of randomness. On the practical side, we implement and evaluate a solver for the threshold and almost-sure problem based on the algorithms that we develop.
AIJul 28, 2025
Algorithmic Fairness: A Runtime PerspectiveFilip Cano, Thomas A. Henzinger, Konstantin Kueffner
Fairness in AI is traditionally studied as a static property evaluated once, over a fixed dataset. However, real-world AI systems operate sequentially, with outcomes and environments evolving over time. This paper proposes a framework for analysing fairness as a runtime property. Using a minimal yet expressive model based on sequences of coin tosses with possibly evolving biases, we study the problems of monitoring and enforcing fairness expressed in either toss outcomes or coin biases. Since there is no one-size-fits-all solution for either problem, we provide a summary of monitoring and enforcement strategies, parametrised by environment dynamics, prediction horizon, and confidence thresholds. For both problems, we present general results under simple or minimal assumptions. We survey existing solutions for the monitoring problem for Markovian and additive dynamics, and existing solutions for the enforcement problem in static settings with known dynamics.
SCJul 16, 2025
Formal Verification of Neural Certificates Done DynamicallyThomas A. Henzinger, Konstantin Kueffner, Emily Yu
Neural certificates have emerged as a powerful tool in cyber-physical systems control, providing witnesses of correctness. These certificates, such as barrier functions, often learned alongside control policies, once verified, serve as mathematical proofs of system safety. However, traditional formal verification of their defining conditions typically faces scalability challenges due to exhaustive state-space exploration. To address this challenge, we propose a lightweight runtime monitoring framework that integrates real-time verification and does not require access to the underlying control policy. Our monitor observes the system during deployment and performs on-the-fly verification of the certificate over a lookahead region to ensure safety within a finite prediction horizon. We instantiate this framework for ReLU-based control barrier functions and demonstrate its practical effectiveness in a case study. Our approach enables timely detection of safety violations and incorrect certificates with minimal overhead, providing an effective but lightweight alternative to the static verification of the certificates.
LGJul 3, 2025
Monitoring of Static FairnessThomas A. Henzinger, Mahyar Karimi, Konstantin Kueffner et al.
Machine-learned systems are in widespread use for making decisions about humans, and it is important that they are fair, i.e., not biased against individuals based on sensitive attributes. We present a general framework of runtime verification of algorithmic fairness for systems whose models are unknown, but are assumed to have a Markov chain structure, with or without full observation of the state space. We introduce a specification language that can model many common algorithmic fairness properties, such as demographic parity, equal opportunity, and social burden. We build monitors that observe a long sequence of events as generated by a given system, and output, after each observation, a quantitative estimate of how fair or biased the system was on that run until that point in time. The estimate is proven to be correct modulo a variable error bound and a given confidence level, where the error bound gets tighter as the observed sequence gets longer. We present two categories of monitoring algorithms, namely ones with a uniform error bound across all time points, and ones with weaker non-uniform, pointwise error bounds at different time points. Our monitoring algorithms use statistical tools that are adapted to suit the dynamic requirements of monitoring and the special needs of the fairness specifications. Using a prototype implementation, we show how we can monitor if a bank is fair in giving loans to applicants from different social backgrounds, and if a college is fair in admitting students while maintaining a reasonable financial burden on the society. In these experiments, our monitors took less than a millisecond to update their verdicts after each observation.
CYMay 25, 2023
Monitoring Algorithmic FairnessThomas A. Henzinger, Mahyar Karimi, Konstantin Kueffner et al.
Machine-learned systems are in widespread use for making decisions about humans, and it is important that they are fair, i.e., not biased against individuals based on sensitive attributes. We present runtime verification of algorithmic fairness for systems whose models are unknown, but are assumed to have a Markov chain structure. We introduce a specification language that can model many common algorithmic fairness properties, such as demographic parity, equal opportunity, and social burden. We build monitors that observe a long sequence of events as generated by a given system, and output, after each observation, a quantitative estimate of how fair or biased the system was on that run until that point in time. The estimate is proven to be correct modulo a variable error bound and a given confidence level, where the error bound gets tighter as the observed sequence gets longer. Our monitors are of two types, and use, respectively, frequentist and Bayesian statistical inference techniques. While the frequentist monitors compute estimates that are objectively correct with respect to the ground truth, the Bayesian monitors compute estimates that are correct subject to a given prior belief about the system's model. Using a prototype implementation, we show how we can monitor if a bank is fair in giving loans to applicants from different social backgrounds, and if a college is fair in admitting students while maintaining a reasonable financial burden on the society. Although they exhibit different theoretical complexities in certain cases, in our experiments, both frequentist and Bayesian monitors took less than a millisecond to update their verdicts after each observation.
CYMay 8, 2023
Runtime Monitoring of Dynamic Fairness PropertiesThomas A. Henzinger, Mahyar Karimi, Konstantin Kueffner et al.
A machine-learned system that is fair in static decision-making tasks may have biased societal impacts in the long-run. This may happen when the system interacts with humans and feedback patterns emerge, reinforcing old biases in the system and creating new biases. While existing works try to identify and mitigate long-run biases through smart system design, we introduce techniques for monitoring fairness in real time. Our goal is to build and deploy a monitor that will continuously observe a long sequence of events generated by the system in the wild, and will output, with each event, a verdict on how fair the system is at the current point in time. The advantages of monitoring are two-fold. Firstly, fairness is evaluated at run-time, which is important because unfair behaviors may not be eliminated a priori, at design-time, due to partial knowledge about the system and the environment, as well as uncertainties and dynamic changes in the system and the environment, such as the unpredictability of human behavior. Secondly, monitors are by design oblivious to how the monitored system is constructed, which makes them suitable to be used as trusted third-party fairness watchdogs. They function as computationally lightweight statistical estimators, and their correctness proofs rely on the rigorous analysis of the stochastic process that models the assumptions about the underlying dynamics of the system. We show, both in theory and experiments, how monitors can warn us (1) if a bank's credit policy over time has created an unfair distribution of credit scores among the population, and (2) if a resource allocator's allocation policy over time has made unfair allocations. Our experiments demonstrate that the monitors introduce very low overhead. We believe that runtime monitoring is an important and mathematically rigorous new addition to the fairness toolbox.
LGDec 17, 2021
Stability Verification in Stochastic Control Systems via Neural Network SupermartingalesMathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee et al.
We consider the problem of formally verifying almost-sure (a.s.) asymptotic stability in discrete-time nonlinear stochastic control systems. While verifying stability in deterministic control systems is extensively studied in the literature, verifying stability in stochastic control systems is an open problem. The few existing works on this topic either consider only specialized forms of stochasticity or make restrictive assumptions on the system, rendering them inapplicable to learning algorithms with neural network policies. In this work, we present an approach for general nonlinear stochastic control problems with two novel aspects: (a) instead of classical stochastic extensions of Lyapunov functions, we use ranking supermartingales (RSMs) to certify a.s.~asymptotic stability, and (b) we present a method for learning neural network RSMs. We prove that our approach guarantees a.s.~asymptotic stability of the system and provides the first method to obtain bounds on the stabilization time, which stochastic Lyapunov functions do not. Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies.
LGNov 4, 2021
Infinite Time Horizon Safety of Bayesian Neural NetworksMathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee et al.
Bayesian neural networks (BNNs) place distributions over the weights of a neural network to model uncertainty in the data and the network's prediction. We consider the problem of verifying safety when running a Bayesian neural network policy in a feedback loop with infinite time horizon systems. Compared to the existing sampling-based approaches, which are inapplicable to the infinite time horizon setting, we train a separate deterministic neural network that serves as an infinite time horizon safety certificate. In particular, we show that the certificate network guarantees the safety of the system over a subset of the BNN weight posterior's support. Our method first computes a safe weight set and then alters the BNN's weight posterior to reject samples outside this set. Moreover, we show how to extend our approach to a safe-exploration reinforcement learning setting, in order to avoid unsafe trajectories during the training of the policy. We evaluate our approach on a series of reinforcement learning benchmarks, including non-Lyapunovian safety specifications.
LGJul 18, 2021
GoTube: Scalable Stochastic Verification of Continuous-Depth ModelsSophie Gruenbacher, Mathias Lechner, Ramin Hasani et al.
We introduce a new stochastic verification algorithm that formally quantifies the behavioral robustness of any time-continuous process formulated as a continuous-depth model. Our algorithm solves a set of global optimization (Go) problems over a given time horizon to construct a tight enclosure (Tube) of the set of all process executions starting from a ball of initial states. We call our algorithm GoTube. Through its construction, GoTube ensures that the bounding tube is conservative up to a desired probability and up to a desired tightness. GoTube is implemented in JAX and optimized to scale to complex continuous-depth neural network models. Compared to advanced reachability analysis tools for time-continuous neural networks, GoTube does not accumulate overapproximation errors between time steps and avoids the infamous wrapping effect inherent in symbolic techniques. We show that GoTube substantially outperforms state-of-the-art verification tools in terms of the size of the initial ball, speed, time-horizon, task completion, and scalability on a large set of experiments. GoTube is stable and sets the state-of-the-art in terms of its ability to scale to time horizons well beyond what has been previously possible.
LGMar 15, 2021
Adversarial Training is Not Ready for Robot LearningMathias Lechner, Ramin Hasani, Radu Grosu et al.
Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations, with the cost of nominal performance drop. While adversarial training appears to enhance the robustness and safety of a deep model deployed in open-world decision-critical applications, counterintuitively, it induces undesired behaviors in robot learning settings. In this paper, we show theoretically and experimentally that neural controllers obtained via adversarial training are subjected to three types of defects, namely transient, systematic, and conditional errors. We first generalize adversarial training to a safety-domain optimization scheme allowing for more generic specifications. We then prove that such a learning process tends to cause certain error profiles. We support our theoretical results by a thorough experimental safety analysis in a robot-learning task. Our results suggest that adversarial training is not yet ready for robot learning.
AIDec 15, 2020
Scalable Verification of Quantized Neural Networks (Technical Report)Thomas A. Henzinger, Mathias Lechner, Đorđe Žikelić
Formal verification of neural networks is an active topic of research, and recent advances have significantly increased the size of the networks that verification tools can handle. However, most methods are designed for verification of an idealized model of the actual network which works over real arithmetic and ignores rounding imprecisions. This idealization is in stark contrast to network quantization, which is a technique that trades numerical precision for computational efficiency and is, therefore, often applied in practice. Neglecting rounding errors of such low-bit quantized neural networks has been shown to lead to wrong conclusions about the network's correctness. Thus, the desired approach for verifying quantized neural networks would be one that takes these rounding errors into account. In this paper, we show that verifying the bit-exact implementation of quantized neural networks with bit-vector specifications is PSPACE-hard, even though verifying idealized real-valued networks and satisfiability of bit-vector specifications alone are each in NP. Furthermore, we explore several practical heuristics toward closing the complexity gap between idealized and bit-exact verification. In particular, we propose three techniques for making SMT-based verification of quantized neural networks more scalable. Our experiments demonstrate that our proposed methods allow a speedup of up to three orders of magnitude over existing approaches.
LGSep 14, 2020
Into the Unknown: Active Monitoring of Neural NetworksAnna Lukina, Christian Schilling, Thomas A. Henzinger
Neural-network classifiers achieve high accuracy when predicting the class of an input that they were trained to identify. Maintaining this accuracy in dynamic environments, where inputs frequently fall outside the fixed set of initially known classes, remains a challenge. The typical approach is to detect inputs from novel classes and retrain the classifier on an augmented dataset. However, not only the classifier but also the detection mechanism needs to adapt in order to distinguish between newly learned and yet unknown input classes. To address this challenge, we introduce an algorithmic framework for active monitoring of a neural network. A monitor wrapped in our framework operates in parallel with the neural network and interacts with a human user via a series of interpretable labeling queries for incremental adaptation. In addition, we propose an adaptive quantitative monitor to improve precision. An experimental evaluation on a diverse set of benchmarks with varying numbers of classes confirms the benefits of our active monitoring framework in dynamic scenarios.
LOMay 25, 2020
Formal Methods with a Touch of MagicParand Alizadeh Alamdari, Guy Avni, Thomas A. Henzinger et al.
Machine learning and formal methods have complimentary benefits and drawbacks. In this work, we address the controller-design problem with a combination of techniques from both fields. The use of black-box neural networks in deep reinforcement learning (deep RL) poses a challenge for such a combination. Instead of reasoning formally about the output of deep RL, which we call the {\em wizard}, we extract from it a decision-tree based model, which we refer to as the {\em magic book}. Using the extracted model as an intermediary, we are able to handle problems that are infeasible for either deep RL or formal methods by themselves. First, we suggest, for the first time, combining a magic book in a synthesis procedure. We synthesize a stand-alone correct-by-design controller that enjoys the favorable performance of RL. Second, we incorporate a magic book in a bounded model checking (BMC) procedure. BMC allows us to find numerous traces of the plant under the control of the wizard, which a user can use to increase the trustworthiness of the wizard and direct further training.
LGNov 20, 2019
Outside the Box: Abstraction-Based Monitoring of Neural NetworksThomas A. Henzinger, Anna Lukina, Christian Schilling
Neural networks have demonstrated unmatched performance in a range of classification tasks. Despite numerous efforts of the research community, novelty detection remains one of the significant limitations of neural networks. The ability to identify previously unseen inputs as novel is crucial for our understanding of the decisions made by neural networks. At runtime, inputs not falling into any of the categories learned during training cannot be classified correctly by the neural network. Existing approaches treat the neural network as a black box and try to detect novel inputs based on the confidence of the output predictions. However, neural networks are not trained to reduce their confidence for novel inputs, which limits the effectiveness of these approaches. We propose a framework to monitor a neural network by observing the hidden layers. We employ a common abstraction from program analysis - boxes - to identify novel behaviors in the monitored layers, i.e., inputs that cause behaviors outside the box. For each neuron, the boxes range over the values seen in training. The framework is efficient and flexible to achieve a desired trade-off between raising false warnings and detecting novel inputs. We illustrate the performance and the robustness to variability in the unknown classes on popular image-classification benchmarks.
SEApr 15, 2019
Compositional Specifications for ioco TestingPrzemyslaw Daca, Thomas A. Henzinger, Willibald Krenn et al.
Model-based testing is a promising technology for black-box software and hardware testing, in which test cases are generated automatically from high-level specifications. Nowadays, systems typically consist of multiple interacting components and, due to their complexity, testing presents a considerable portion of the effort and cost in the design process. Exploiting the compositional structure of system specifications can considerably reduce the effort in model-based testing. Moreover, inferring properties about the system from testing its individual components allows the designer to reduce the amount of integration testing. In this paper, we study compositional properties of the IOCO-testing theory. We propose a new approach to composition and hiding operations, inspired by contract-based design and interface theories. These operations preserve behaviors that are compatible under composition and hiding, and prune away incompatible ones. The resulting specification characterizes the input sequences for which the unit testing of components is sufficient to infer the correctness of component integration without the need for further tests. We provide a methodology that uses these results to minimize integration testing effort, but also to detect potential weaknesses in specifications. While we focus on asynchronous models and the IOCO conformance relation, the resulting methodology can be applied to a broader class of systems.
SEOct 9, 2012
Interface Simulation DistancesPavol Černý, Martin Chmelík, Thomas A. Henzinger et al.
The classical (boolean) notion of refinement for behavioral interfaces of system components is the alternating refinement preorder. In this paper, we define a distance for interfaces, called interface simulation distance. It makes the alternating refinement preorder quantitative by, intuitively, tolerating errors (while counting them) in the alternating simulation game. We show that the interface simulation distance satisfies the triangle inequality, that the distance between two interfaces does not increase under parallel composition with a third interface, and that the distance between two interfaces can be bounded from above and below by distances between abstractions of the two interfaces. We illustrate the framework, and the properties of the distances under composition of interfaces, with two case studies.
AIJul 4, 2012
Counterexample-guided PlanningKrishnendu Chatterjee, Thomas A. Henzinger, Ranjit Jhala et al.
Planning in adversarial and uncertain environments can be modeled as the problem of devising strategies in stochastic perfect information games. These games are generalizations of Markov decision processes (MDPs): there are two (adversarial) players, and a source of randomness. The main practical obstacle to computing winning strategies in such games is the size of the state space. In practice therefore, one typically works with abstractions of the model. The diffculty is to come up with an abstraction that is neither too coarse to remove all winning strategies (plans), nor too fine to be intractable. In verification, the paradigm of counterexample-guided abstraction refinement has been successful to construct useful but parsimonious abstractions automatically. We extend this paradigm to probabilistic models (namely, perfect information games and, as a special case, MDPs). This allows us to apply the counterexample-guided abstraction paradigm to the AI planning problem. As special cases, we get planning algorithms for MDPs and deterministic systems that automatically construct system abstractions.