61.8LGJun 3
Explainably Safe Reinforcement LearningSabine Rieder, Stefan Pranger, Debraj Chakraborty et al.
Trust in a decision-making system requires both safety guarantees and the ability to interpret and understand its behavior. This is particularly important for learned systems, whose decision-making processes are often highly opaque. Shielding is a prominent model-based technique for enforcing safety in reinforcement learning. However, because shields are automatically synthesized using rigorous formal methods, their decisions are often similarly difficult for humans to interpret. Recently, decision trees became customary to represent controllers and policies. However, since shields are inherently non-deterministic, their decision tree representations become too large to be explainable in practice. To address this challenge, we propose a novel approach for explainable safe RL that enhances trust by providing human-interpretable explanations of the shield's decisions. Our method represents the shielding policy as a hierarchy of decision trees, offering top-down, case-based explanations. At design time, we use a world model to analyze the safety risks of executing actions in given states. Based on this analysis, we construct both the shield and a high-level decision tree that classifies states into risk categories (safe, critical, dangerous, unsafe), explaining why a situation may be safety-critical. At runtime, we generate localized decision trees that explain which actions are allowed and why others are deemed unsafe. Our method facilitates explainability of the safety aspect in safe-by-shielding reinforcement learning, requires no additional information beyond what is already used for shielding, incurs minimal overhead, and integrates readily into existing shielded RL pipelines. In our experiments, we compute explanations using decision trees that are several orders of magnitude smaller than the original shield.
SYJul 25, 2018
Continuous-Time Markov Decisions based on Partial ExplorationPranav Ashok, Yuliya Butkova, Holger Hermanns et al.
We provide a framework for speeding up algorithms for time-bounded reachability analysis of continuous-time Markov decision processes. The principle is to find a small, but almost equivalent subsystem of the original system and only analyse the subsystem. Candidates for the subsystem are identified through simulations and iteratively enlarged until runs are represented in the subsystem with high enough probability. The framework is thus dual to that of abstraction refinement. We instantiate the framework in several ways with several traditional algorithms and experimentally confirm orders-of-magnitude speed ups in many cases.
SYSep 12, 2011
Fixed-delay Events in Generalized Semi-Markov Processes RevisitedTomáš Brázdil, Jan Krčál, Jan Křetínský et al.
We study long run average behavior of generalized semi-Markov processes with both fixed-delay events as well as variable-delay events. We show that allowing two fixed-delay events and one variable-delay event may cause an unstable behavior of a GSMP. In particular, we show that a frequency of a given state may not be defined for almost all runs (or more generally, an invariant measure may not exist). We use this observation to disprove several results from literature. Next we study GSMP with at most one fixed-delay event combined with an arbitrary number of variable-delay events. We prove that such a GSMP always possesses an invariant measure which means that the frequencies of states are always well defined and we provide algorithms for approximation of these frequencies. Additionally, we show that the positive results remain valid even if we allow an arbitrary number of reasonably restricted fixed-delay events.
AIApr 19, 2023
Stopping Criteria for Value Iteration on Stochastic Games with Quantitative ObjectivesJan Křetínský, Tobias Meggendorfer, Maximilian Weininger
A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value iteration (VI). Due to its good practical performance, this approximative approach is typically preferred over exact techniques, even though no practical bounds on the imprecision of the result could be given until recently. As a consequence, even the most used model checkers could return arbitrarily wrong results. Over the past decade, different works derived stopping criteria, indicating when the precision reaches the desired level, for various settings, in particular MDP with reachability, total reward, and mean payoff, and SG with reachability. In this paper, we provide the first stopping criteria for VI on SG with total reward and mean payoff, yielding the first anytime algorithms in these settings. To this end, we provide the solution in two flavours: First through a reduction to the MDP case and second directly on SG. The former is simpler and automatically utilizes any advances on MDP. The latter allows for more local computations, heading towards better practical efficiency. Our solution unifies the previously mentioned approaches for MDP and SG and their underlying ideas. To achieve this, we isolate objective-specific subroutines as well as identify objective-independent concepts. These structural concepts, while surprisingly simple, form the very essence of the unified solution.
LGAug 26, 2022
Algebraically Explainable Controllers: Decision Trees and Support Vector Machines Join ForcesFlorian Jüngermann, Jan Křetínský, Maximilian Weininger
Recently, decision trees (DT) have been used as an explainable representation of controllers (a.k.a. strategies, policies, schedulers). Although they are often very efficient and produce small and understandable controllers for discrete systems, complex continuous dynamics still pose a challenge. In particular, when the relationships between variables take more complex forms, such as polynomials, they cannot be obtained using the available DT learning procedures. In contrast, support vector machines provide a more powerful representation, capable of discovering many such relationships, but not in an explainable form. Therefore, we suggest to combine the two frameworks in order to obtain an understandable representation over richer, domain-relevant algebraic predicates. We demonstrate and evaluate the proposed method experimentally on established benchmarks.
SYMay 23, 2019
Semi-Quantitative Abstraction and Analysis of Chemical Reaction NetworksMilan Češka, Jan Křetínský
Analysis of large continuous-time stochastic systems is a computationally intensive task. In this work we focus on population models arising from chemical reaction networks (CRNs), which play a fundamental role in analysis and design of biochemical systems. Many relevant CRNs are particularly challenging for existing techniques due to complex dynamics including stochasticity, stiffness or multimodal population distributions. We propose a novel approach allowing not only to predict, but also to explain both the transient and steady-state behaviour. It focuses on qualitative description of the behaviour and aims at quantitative precision only in orders of magnitude. Firstly, we abstract the CRN into a compact model preserving rough timing information, distinguishing only signifcinatly different populations, but capturing relevant sequences of behaviour. Secondly, we approximately analyse the most probable temporal behaviours of the model through most probable transitions. As demonstrated on complex CRNs from literature, our approach reproduces the known results, but in contrast to the state-of-the-art methods, it runs with virtually no computational cost and thus offers unprecedented~scalability.
SYJun 3, 2022
PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDPChaitanya Agarwal, Shibashis Guha, Jan Křetínský et al.
Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, we extend it to unknown CTMDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability, which has been advocated in literature. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.
SYJan 21, 2011
Measuring Performance of Continuous-Time Stochastic Processes using Timed AutomataTomáš Brázdil, Jan Krčál, Jan Křetínský et al.
We propose deterministic timed automata (DTA) as a model-independent language for specifying performance and dependability measures over continuous-time stochastic processes. Technically, these measures are defined as limit frequencies of locations (control states) of a DTA that observes computations of a given stochastic process. Then, we study the properties of DTA measures over semi-Markov processes in greater detail. We show that DTA measures over semi-Markov processes are well-defined with probability one, and there are only finitely many values that can be assumed by these measures with positive probability. We also give an algorithm which approximates these values and the associated probabilities up to an arbitrarily small given precision. Thus, we obtain a general and effective framework for analysing DTA measures over semi-Markov processes.
LOJul 20, 2023
Syntactic vs Semantic Linear Abstraction and Refinement of Neural NetworksCalvin Chau, Jan Křetínský, Stefanie Mohr
Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.
LGFeb 13
Talking with Verifiers: Automatic Specification Generation for Neural Network VerificationYizhak Y. Elboher, Reuven Peleg, Zhouxing Shi et al.
Neural network verification tools currently support only a narrow class of specifications, typically expressed as low-level constraints over raw inputs and outputs. This limitation significantly hinders their adoption and practical applicability across diverse application domains where correctness requirements are naturally expressed at a higher semantic level. This challenge is rooted in the inherent nature of deep neural networks, which learn internal representations that lack an explicit mapping to human-understandable features. To address this, we bridge this gap by introducing a novel component to the verification pipeline, making existing verification tools applicable to a broader range of domains and specification styles. Our framework enables users to formulate specifications in natural language, which are then automatically analyzed and translated into formal verification queries compatible with state-of-the-art neural network verifiers. We evaluate our approach on both structured and unstructured datasets, demonstrating that it successfully verifies complex semantic specifications that were previously inaccessible. Our results show that this translation process maintains high fidelity to user intent while incurring low computational overhead, thereby substantially extending the applicability of formal DNN verification to real-world, high-level requirements.
29.3LOApr 27
Logic of Fuzzy PathsKush Grover, Pratham Gupta, Jan Křetínský
We introduce a new family of temporal logics intended for specifications in motion planning (MP). It builds upon the signal temporal logic (STL), which is a linear-time logic over real-valued signals that possess quantitative semantics and thus became popular in the areas of cyber-physical systems, robotics, and specifically robot MP. However, in contrast to STL, the proposed logic works with paths as first-class citizens, separating the concerns of geometry and of logic. This in turn leads to simpler and more understandable formulae, and a more refined notion of satisfaction being able to reflect also preferences over behaviours. Technically, the logic is built on fuzzy, time-varying signal constraints. As a consequence of this expressivity, it is (i) more usable for human-given specifications in MP and (ii) more amenable to learning specifications from demonstrations than other logics. The former is important for the traditional style of verification in robot MP; the latter is becoming recognized as crucial for mining data-given tasks and controller synthesis in human-aware MP. We expose the advantages of our proposed logic on examples and show the versatility and flexibility of the framework on a number of scenarios. Finally, we give a learning algorithm with a prototype implementation and discuss the possibilities of model checking and monitoring.
22.2AIApr 27
SemML 2.0: Synthesizing Controllers for LTLJan Křetínský, Tobias Meggendorfer, Maximilian Prokop
Synthesizing a reactive system from specifications given in linear temporal logic (LTL) is a classical problem, finding its applications in safety-critical systems design. These systems are typically represented using either Mealy machines or AIGER circuits. We present the second version of SemML, which outperforms all state-of-the-art tools for finding either solution. Aside from implementing the classical automata-theoretic approach, our tool utilizes partial exploration and machine-learning guidance for obtaining solutions efficiently, and numerous heuristics and improvements of classic algorithms for extracting small representations of these solutions. We evaluate our tool against the existing state-of-the-art tools (in particular Strix, LtlSynt, and the previous version of SemML) on the dataset of the synthesis competition SYNTCOMP. We show that we solve significantly more instances and do so much faster than other tools, while maintaining state-of-the-art solution quality.
AIMay 23, 2024
stl2vec: Semantic and Interpretable Vector Representation of Temporal LogicGaia Saveri, Laura Nenzi, Luca Bortolussi et al.
Integrating symbolic knowledge and data-driven learning algorithms is a longstanding challenge in Artificial Intelligence. Despite the recognized importance of this task, a notable gap exists due to the discreteness of symbolic representations and the continuous nature of machine-learning computations. One of the desired bridges between these two worlds would be to define semantically grounded vector representation (feature embedding) of logic formulae, thus enabling to perform continuous learning and optimization in the semantic space of formulae. We tackle this goal for knowledge expressed in Signal Temporal Logic (STL) and devise a method to compute continuous embeddings of formulae with several desirable properties: the embedding (i) is finite-dimensional, (ii) faithfully reflects the semantics of the formulae, (iii) does not require any learning but instead is defined from basic principles, (iv) is interpretable. Another significant contribution lies in demonstrating the efficacy of the approach in two tasks: learning model checking, where we predict the probability of requirements being satisfied in stochastic processes; and integrating the embeddings into a neuro-symbolic framework, to constrain the output of a deep-learning generative model to comply to a given logical specification.
LOMay 27, 2025
Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety GamesMarta Grobelna, Jan Křetínský, Maximilian Weininger
We consider two-player zero-sum concurrent stochastic games (CSGs) played on graphs with reachability and safety objectives. These include degenerate classes such as Markov decision processes or turn-based stochastic games, which can be solved by linear or quadratic programming; however, in practice, value iteration (VI) outperforms the other approaches and is the most implemented method. Similarly, for CSGs, this practical performance makes VI an attractive alternative to the standard theoretical solution via the existential theory of reals. VI starts with an under-approximation of the sought values for each state and iteratively updates them, traditionally terminating once two consecutive approximations are $ε$-close. However, this stopping criterion lacks guarantees on the precision of the approximation, which is the goal of this work. We provide bounded (a.k.a. interval) VI for CSGs: it complements standard VI with a converging sequence of over-approximations and terminates once the over- and under-approximations are $ε$-close.
SYMar 14, 2024
Learning Algorithms for Verification of Markov Decision ProcessesTomáš Brázdil, Krishnendu Chatterjee, Martin Chmelik et al.
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.
AIMay 26, 2023
MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraintsSeverin Bals, Alexandros Evangelidis, Jan Křetínský et al.
We present MULTIGAIN 2.0, a major extension to the controller synthesis tool MULTIGAIN, built on top of the probabilistic model checker PRISM. This new version extends MULTIGAIN's multi-objective capabilities, by allowing for the formal verification and synthesis of controllers for probabilistic systems with multi-dimensional long-run average reward structures, steady-state constraints, and linear temporal logic properties. Additionally, MULTIGAIN 2.0 can modify the underlying linear program to prevent unbounded-memory and other unintuitive solutions and visualizes Pareto curves, in the two- and three-dimensional cases, to facilitate trade-off analysis in multi-objective scenarios.
LOJan 24, 2022
Learning Model Checking and the Kernel Trick for Signal Temporal Logic on Stochastic ProcessesLuca Bortolussi, Giuseppe Maria Gallo, Jan Křetínský et al.
We introduce a similarity function on formulae of signal temporal logic (STL). It comes in the form of a kernel function, well known in machine learning as a conceptually and computationally efficient tool. The corresponding kernel trick allows us to circumvent the complicated process of feature extraction, i.e. the (typically manual) effort to identify the decisive properties of formulae so that learning can be applied. We demonstrate this consequence and its advantages on the task of predicting (quantitative) satisfaction of STL formulae on stochastic processes: Using our kernel and the kernel trick, we learn (i) computationally efficiently (ii) a practically precise predictor of satisfaction, (iii) avoiding the difficult task of finding a way to explicitly turn formulae into vectors of numbers in a sensible way. We back the high precision we have achieved in the experiments by a theoretically sound PAC guarantee, ensuring our procedure efficiently delivers a close-to-optimal predictor.
AIMay 31, 2021
LTL-Constrained Steady-State Policy SynthesisJan Křetínský
Decision-making policies for agents are often synthesized with the constraint that a formal specification of behaviour is satisfied. Here we focus on infinite-horizon properties. On the one hand, Linear Temporal Logic (LTL) is a popular example of a formalism for qualitative specifications. On the other hand, Steady-State Policy Synthesis (SSPS) has recently received considerable attention as it provides a more quantitative and more behavioural perspective on specifications, in terms of the frequency with which states are visited. Finally, rewards provide a classic framework for quantitative properties. In this paper, we study Markov decision processes (MDP) with the specification combining all these three types. The derived policy maximizes the reward among all policies ensuring the LTL specification with the given probability and adhering to the steady-state constraints. To this end, we provide a unified solution reducing the multi-type specification to a multi-dimensional long-run average reward. This is enabled by Limit-Deterministic Büchi Automata (LDBA), recently studied in the context of LTL model checking on MDP, and allows for an elegant solution through a simple linear programme. The algorithm also extends to the general $ω$-regular properties and runs in time polynomial in the sizes of the MDP as well as the LDBA.
AIJan 15, 2021
dtControl 2.0: Explainable Strategy Representation via Decision Tree Learning Steered by ExpertsPranav Ashok, Mathias Jackermeier, Jan Křetínský et al.
Recent advances have shown how decision trees are apt data structures for concisely representing strategies (or controllers) satisfying various objectives. Moreover, they also make the strategy more explainable. The recent tool dtControl had provided pipelines with tools supporting strategy synthesis for hybrid systems, such as SCOTS and Uppaal Stratego. We present dtControl 2.0, a new version with several fundamentally novel features. Most importantly, the user can now provide domain knowledge to be exploited in the decision tree learning process and can also interactively steer the process based on the dynamically provided information. To this end, we also provide a graphical user interface. It allows for inspection and re-computation of parts of the result, suggesting as well as receiving advice on predicates, and visual simulation of the decision-making process. Besides, we interface model checkers of probabilistic systems, namely Storm and PRISM and provide dedicated support for categorical enumeration-type state variables. Consequently, the controllers are more explainable and smaller.
ROJun 30, 2020
Formalizing and Guaranteeing* Human-Robot InteractionHadas Kress-Gazit, Kerstin Eder, Guy Hoffman et al.
Robot capabilities are maturing across domains, from self-driving cars, to bipeds and drones. As a result, robots will soon no longer be confined to safety-controlled industrial settings; instead, they will directly interact with the general public. The growing field of Human-Robot Interaction (HRI) studies various aspects of this scenario - from social norms to joint action to human-robot teams and more. Researchers in HRI have made great strides in developing models, methods, and algorithms for robots acting with and around humans, but these "computational HRI" models and algorithms generally do not come with formal guarantees and constraints on their operation. To enable human-interactive robots to move from the lab to real-world deployments, we must address this gap. This article provides an overview of verification, validation and synthesis techniques used to create demonstrably trustworthy systems, describes several HRI domains that could benefit from such techniques, and provides a roadmap for the challenges and the research needed to create formalized and guaranteed human-robot interaction.
LOJun 24, 2020
DeepAbstract: Neural Network Abstraction for Accelerating VerificationPranav Ashok, Vahid Hashemi, Jan Křetínský et al.
While abstraction is a classic tool of verification to scale it up, it is not used very often for verifying neural networks. However, it can help with the still open task of scaling existing algorithms to state-of-the-art network architectures. We introduce an abstraction framework applicable to fully-connected feed-forward neural networks based on clustering of neurons that behave similarly on some inputs. For the particular case of ReLU, we additionally provide error bounds incurred by the abstraction. We show how the abstraction reduces the size of the network, while preserving its accuracy, and how verification results on the abstract network can be transferred back to the original network.
LGFeb 12, 2020
dtControl: Decision Tree Learning Algorithms for Controller RepresentationPranav Ashok, Mathias Jackermeier, Pushpak Jagtap et al.
Decision tree learning is a popular classification technique most commonly used in machine learning applications. Recent work has shown that decision trees can be used to represent provably-correct controllers concisely. Compared to representations using lookup tables or binary decision diagrams, decision trees are smaller and more explainable. We present dtControl, an easily extensible tool for representing memoryless controllers as decision trees. We give a comprehensive evaluation of various decision tree learning algorithms applied to 10 case studies arising out of correct-by-construction controller synthesis. These algorithms include two new techniques, one for using arbitrary linear binary classifiers in the decision tree learning, and one novel approach for determinizing controllers during the decision tree construction. In particular the latter turns out to be extremely efficient, yielding decision trees with a single-digit number of decision nodes on 5 of the case studies.
SYJun 17, 2019
Of Cores: A Partial-Exploration Framework for Markov Decision ProcessesJan Křetínský, Tobias Meggendorfer
We introduce a framework for approximate analysis of Markov decision processes (MDP) with bounded-, unbounded-, and infinite-horizon properties. The main idea is to identify a "core" of an MDP, i.e., a subsystem where we provably remain with high probability, and to avoid computation on the less relevant rest of the state space. Although we identify the core using simulations and statistical techniques, it allows for rigorous error bounds in the analysis. Consequently, we obtain efficient analysis algorithms based on partial exploration for various settings, including the challenging case of strongly connected systems.
SYMay 10, 2019
PAC Statistical Model Checking for Markov Decision Processes and Stochastic GamesPranav Ashok, Jan Křetínský, Maximilian Weininger
Statistical model checking (SMC) is a technique for analysis of probabilistic systems that may be (partially) unknown. We present an SMC algorithm for (unbounded) reachability yielding probably approximately correct (PAC) guarantees on the results. We consider both the setting (i) with no knowledge of the transition function (with the only quantity required a bound on the minimum transition probability) and (ii) with knowledge of the topology of the underlying graph. On the one hand, it is the first algorithm for stochastic games. On the other hand, it is the first practical algorithm even for Markov decision processes. Compared to previous approaches where PAC guarantees require running times longer than the age of universe even for systems with a handful of states, our algorithm often yields reasonably precise results within minutes, not requiring the knowledge of mixing time or the topology of the whole model.
AIApr 24, 2018
Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular ConstraintsJan Křetínský, Guillermo A. Pérez, Jean-François Raskin
We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all $ε$ and $γ$ we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an $ε$-optimal mean payoff with probability at least $1 - γ$. (ii) Alternatively, for all $ε$ and $γ$ there exists an online-learning infinite-memory strategy that satisfies the parity objective surely and which achieves an $ε$-optimal mean payoff with probability at least $1 - γ$. We extend the above results to MDPs consisting of more than one end component in a natural way. Finally, we show that the aforementioned guarantees are tight, i.e. there are MDPs for which stronger combinations of the guarantees cannot be ensured.
SYAug 31, 2017
Value Iteration for Long-run Average Reward in Markov Decision ProcessesPranav Ashok, Krishnendu Chatterjee, Przemyslaw Daca et al.
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks.