Sebastian Junges

AI
h-index10
41papers
1,341citations
Novelty46%
AI Score56

41 Papers

AIApr 2, 2022
Safe Reinforcement Learning via Shielding under Partial Observability

Steven Carr, Nils Jansen, Sebastian Junges et al.

Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches to this problem assume domain knowledge in the form of a (partial) model of this environment to decide upon the safety of an action. A so-called shield forces the RL agent to select only safe actions. However, for adoption in various applications, one must look beyond enforcing safety and also ensure the applicability of RL with good performance. We extend the applicability of shields via tight integration with state-of-the-art deep RL, and provide an extensive, empirical study in challenging, sparse-reward environments under partial observability. We show that a carefully integrated shield ensures safety and can improve the convergence rate and final performance of RL agents. We furthermore show that a shield can be used to bootstrap state-of-the-art RL agents: they remain safe after initial learning in a shielded setting, allowing us to disable a potentially too conservative shield eventually.

LOJul 17, 2018
Permissive Finite-State Controllers of POMDPs using Parameter Synthesis

Sebastian Junges, Nils Jansen, Ralf Wimmer et al.

We study finite-state controllers (FSCs) for partially observable Markov decision processes (POMDPs) that are provably correct with respect to given specifications. The key insight is that computing (randomised) FSCs on POMDPs is equivalent to - and computationally as hard as - synthesis for parametric Markov chains (pMCs). This correspondence allows to use tools for parameter synthesis in pMCs to compute correct-by-construction FSCs on POMDPs for a variety of specifications. Our experimental evaluation shows comparable performance to well-known POMDP solvers.

FLJul 19, 2023
Learning Formal Specifications from Membership and Preference Queries

Ameesh Shah, Marcell Vazquez-Chanlatte, Sebastian Junges et al.

Active learning is a well-studied approach to learning formal specifications, such as automata. In this work, we extend active specification learning by proposing a novel framework that strategically requests a combination of membership labels and pair-wise preferences, a popular alternative to membership labels. The combination of pair-wise preferences and membership labels allows for a more flexible approach to active specification learning, which previously relied on membership labels only. We instantiate our framework in two different domains, demonstrating the generality of our approach. Our results suggest that learning from both modalities allows us to robustly and conveniently identify specifications via membership and preferences.

LGSep 15, 2022
COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking

Dennis Gross, Nils Jansen, Sebastian Junges et al.

This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model builder for Storm, which uses callback functions to verify (neural network) RL policies, (3) formal abstractions that relate models and policies specified in OpenAI gym or Storm, and (4) algorithms to obtain bounds on the performance of so-called permissive policies. We describe the components and architecture of COOL-MC and demonstrate its features on multiple benchmark environments.

AIJun 6, 2022
Abstraction-Refinement for Hierarchical Probabilistic Models

Sebastian Junges, Matthijs T. J. Spaan

Markov decision processes are a ubiquitous formalism for modelling systems with non-deterministic and probabilistic behavior. Verification of these models is subject to the famous state space explosion problem. We alleviate this problem by exploiting a hierarchical structure with repetitive parts. This structure not only occurs naturally in robotics, but also in probabilistic programs describing, e.g., network protocols. Such programs often repeatedly call a subroutine with similar behavior. In this paper, we focus on a local case, in which the subroutines have a limited effect on the overall system state. The key ideas to accelerate analysis of such programs are (1) to treat the behavior of the subroutine as uncertain and only remove this uncertainty by a detailed analysis if needed, and (2) to abstract similar subroutines into a parametric template, and then analyse this template. These two ideas are embedded into an abstraction-refinement loop that analyses hierarchical MDPs. A prototypical implementation shows the efficacy of the approach.

SEMar 16
Probabilistic Model Checking Taken by Storm

Matthias Volk, Linus Heck, Sebastian Junges et al.

This tutorial paper presents a hands-on perspective on probabilistic model checking with the Storm model checker. Storm is a decade-old model checker that excels in performance and a rich Python-based ecosystem, which makes it easy to integrate in various workflows. This tutorial focuses on Markov decision processes (MDP), which are popular in a variety of fields. It demonstrates the basic workflow, from Python-based modeling, model checking with a variety of properties, to the extraction of policies. Further, it showcases the support for recent topics that focus on different types of uncertainty, such as interval MDP and POMDP, and the ability to quickly implement simple algorithms on top of existing data structures.

LOApr 8
Tractable Hyperproperties for MDPs

Lina Gerlach, Tobias Winkler, Erika Ábrahám et al.

Probabilistic hyperproperties describe probabilistic relations between multiple sets of executions in a stochastic system. Prominent examples include information-theoretic characterizations of security and privacy policies. However, model checking for existing probabilistic hyperlogics, such as HyperPCTL and PHL, is undecidable in Markov decision processes (MDPs). In this paper, we study an underexplored problem: the verification of fragments of probabilistic hyperproperties that relate the probabilities of different events to each other, possibly across independent executions of an MDP. Representative verification questions include: Can two different target states be reached from the same initial state with the same probability? (different events), Can a given target state be reached from two different initial states with the same probability? (same event, independent executions), and natural combinations of these forms. Besides reachability, our relational probabilistic properties cover safety, Büchi, and coBüchi objectives. They can also be combined conjunctively, thereby generalizing standard multi-objective MDP properties. We provide efficient algorithms for relevant classes of relational properties, while proving computational hardness and completeness results for others. An implementation of our approach outperforms solvers for more general probabilistic hyperlogics by orders of magnitude on the subset of their benchmarks that lies within our fragment.

LGFeb 25
Error-awareness Accelerates Active Automata Learning

Loes Kruger, Sebastian Junges, Jurriaan Rot

Active automata learning (AAL) algorithms can learn a behavioral model of a system from interacting with it. The primary challenge remains scaling to larger models, in particular in the presence of many possible inputs to the system. Modern AAL algorithms fail to scale even if, in every state, most inputs lead to errors. In various challenging problems from the literature, these errors are observable, i.e., they emit a known error output. Motivated by these problems, we study learning these systems more efficiently. Further, we consider various degrees of knowledge about which inputs are non-error producing at which state. For each level of knowledge, we provide a matching adaptation of the state-of-the-art AAL algorithm L# to make the most of this domain knowledge. Our empirical evaluation demonstrates that the methods accelerate learning by orders of magnitude with strong but realistic domain knowledge to a single order of magnitude with limited domain knowledge.

LOMay 12
Fast Computation of Conditional Probabilities in MDPs and Markov Chain Families

Milan Češka, Sebastian Junges, Luko van der Maas et al.

Computing optimal conditional reachability probabilities in Markov decision processes (MDPs) is tractable by a reduction to reachability probabilities. Yet, this reduction yields cyclic, challenging MDPs that are often notoriously hard to solve. We present an alternative, practically efficient method to compute optimal conditional reachabilities. This new method is numerically stable, can decide the threshold problem in linear time on acyclic MDPs, and yields performance comparable to standard reachability queries. We also integrate the method in an abstraction-refinement framework to analyse millions of Markov chains at once. We demonstrate the efficacy of the new methods on benchmarks from Bayesian network analysis, probabilistic programs, and runtime monitoring and show speed-ups up to multiple orders of magnitude.

LOMay 11
Shields to Guarantee Probabilistic Safety in MDPs

Linus Heck, Filip Macák, Roman Andriushchenko et al.

Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes with strong guarantees about safety and maximal permissiveness. However, shielding systems for probabilistic safety, where something bad is allowed to happen with an acceptable probability, has proven to be more intricate. This paper presents a formal framework that conservatively extends classical shields to probabilistic safety. In this framework, we (i) demonstrate the impossibility of preserving the strong guarantees on safety and permissiveness, (ii) provide natural shields with weaker guarantees, and (iii) introduce offline and online shield constructions ensuring strong safety guarantees. The empirical evaluation highlights the practical advantages of the new shields, as well as their computational feasibility.

AIMar 5, 2018Code
Synthesis in pMDPs: A Tale of 1001 Parameters

Murat Cubuktepe, Nils Jansen, Sebastian Junges et al.

This paper considers parametric Markov decision processes (pMDPs) whose transitions are equipped with affine functions over a finite set of parameters. The synthesis problem is to find a parameter valuation such that the instantiated pMDP satisfies a specification under all strategies. We show that this problem can be formulated as a quadratically-constrained quadratic program (QCQP) and is non-convex in general. To deal with the NP-hardness of such problems, we exploit a convex-concave procedure (CCP) to iteratively obtain local optima. An appropriate interplay between CCP solvers and probabilistic model checkers creates a procedure --- realized in the open-source tool PROPhESY --- that solves the synthesis problem for models with thousands of parameters.

LONov 11, 2025
Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking

Linus Heck, Filip Macák, Milan Češka et al.

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to be robust, i.e., they perform well on perturbations of the MDP and (2) to satisfy additional structural constraints regarding, e.g., their representation or implementation cost. Computing such robust and constrained policies is indeed computationally more challenging. This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework. We achieve flexibility by allowing to express our constraints in a first-order theory over a set of MDPs, while the root for our efficiency lies in the tight integration of satisfiability solvers to handle the combinatorial nature of the problem and probabilistic model checking algorithms to handle the analysis of MDPs. Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis and the competitiveness with state-of-the-art methods for various fragments of the problem.

AIMay 8, 2024
Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Eline M. Bovy, Marnix Suilen, Sebastian Junges et al.

Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPs give access to results for POSGs, studied in game theory; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.

AIDec 18, 2023
Factored Online Planning in Many-Agent POMDPs

Maris F. L. Galesloot, Thiago D. Simão, Sebastian Junges et al.

In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation methods have been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to settings with many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.

AIMay 14, 2025
Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Maris F. L. Galesloot, Roman Andriushchenko, Milan Češka et al.

Partially observable Markov decision processes (POMDPs) model specific environments in sequential decision-making under uncertainty. Critically, optimal policies for POMDPs may not be robust against perturbations in the environment. Hidden-model POMDPs (HM-POMDPs) capture sets of different environment models, that is, POMDPs with a shared action and observation space. The intuition is that the true model is hidden among a set of potential models, and it is unknown which model will be the environment at execution time. A policy is robust for a given HM-POMDP if it achieves sufficient performance for each of its POMDPs. We compute such robust policies by combining two orthogonal techniques: (1) a deductive formal verification technique that supports tractable robust policy evaluation by computing a worst-case POMDP within the HM-POMDP, and (2) subgradient ascent to optimize the candidate policy for a worst-case POMDP. The empirical evaluation shows that, compared to various baselines, our approach (1) produces policies that are more robust and generalize better to unseen POMDPs, and (2) scales to HM-POMDPs that consist of over a hundred thousand environments.

AIFeb 10, 2025
Tighter Value-Function Approximations for POMDPs

Merlijn Krale, Wietze Koops, Sebastian Junges et al.

Solving partially observable Markov decision processes (POMDPs) typically requires reasoning about the values of exponentially many state beliefs. Towards practical performance, state-of-the-art solvers use value bounds to guide this reasoning. However, sound upper value bounds are often computationally expensive to compute, and there is a tradeoff between the tightness of such bounds and their computational cost. This paper introduces new and provably tighter upper value bounds than the commonly used fast informed bound. Our empirical evaluation shows that, despite their additional computational overhead, the new upper bounds accelerate state-of-the-art POMDP solvers on a wide range of benchmarks.

LOJun 28, 2024
State Matching and Multiple References in Adaptive Active Automata Learning

Loes Kruger, Sebastian Junges, Jurriaan Rot

Active automata learning (AAL) is a method to infer state machines by interacting with black-box systems. Adaptive AAL aims to reduce the sample complexity of AAL by incorporating domain specific knowledge in the form of (similar) reference models. Such reference models appear naturally when learning multiple versions or variants of a software system. In this paper, we present state matching, which allows flexible use of the structure of these reference models by the learner. State matching is the main ingredient of adaptive L#, a novel framework for adaptive learning, built on top of L#. Our empirical evaluation shows that adaptive L# improves the state of the art by up to two orders of magnitude.

LGJun 2, 2024
Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates

Thom Badings, Wietze Koops, Sebastian Junges et al.

We consider the verification of neural network policies for discrete-time stochastic systems with respect to reach-avoid specifications. We use a learner-verifier procedure that learns a certificate for the specification, represented as a neural network. Verifying that this neural network certificate is a so-called reach-avoid supermartingale (RASM) proves the satisfaction of a reach-avoid specification. Existing approaches for such a verification task rely on computed Lipschitz constants of neural networks. These approaches struggle with large Lipschitz constants, especially for reach-avoid specifications with high threshold probabilities. We present two key contributions to obtain smaller Lipschitz constants than existing approaches. First, we introduce logarithmic RASMs (logRASMs), which take exponentially smaller values than RASMs and hence have lower theoretical Lipschitz constants. Second, we present a fast method to compute tighter upper bounds on Lipschitz constants based on weighted norms. Our empirical evaluation shows we can consistently verify the satisfaction of reach-avoid specifications with probabilities as high as 99.9999%.

AIMay 9, 2024
Approximate Dec-POMDP Solving Using Multi-Agent A*

Wietze Koops, Sebastian Junges, Nils Jansen

We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.

LGMay 1, 2023
Efficient Sensitivity Analysis for Parametric Robust Markov Chains

Thom Badings, Sebastian Junges, Ahmadreza Marandi et al.

We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of $k$ parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis.

CVDec 1, 2021
Querying Labelled Data with Scenario Programs for Sim-to-Real Validation

Edward Kim, Jay Shenoy, Sebastian Junges et al.

Simulation-based testing of autonomous vehicles (AVs) has become an essential complement to road testing to ensure safety. Consequently, substantial research has focused on searching for failure scenarios in simulation. However, a fundamental question remains: are AV failure scenarios identified in simulation meaningful in reality, i.e., are they reproducible on the real system? Due to the sim-to-real gap arising from discrepancies between simulated and real sensor data, a failure scenario identified in simulation can be either a spurious artifact of the synthetic sensor data or an actual failure that persists with real sensor data. An approach to validate simulated failure scenarios is to identify instances of the scenario in a corpus of real data, and check if the failure persists on the real data. To this end, we propose a formal definition of what it means for a labelled data item to match an abstract scenario, encoded as a scenario program using the SCENIC probabilistic programming language. Using this definition, we develop a querying algorithm which, given a scenario program and a labelled dataset, finds the subset of data matching the scenario. Experiments demonstrate that our algorithm is accurate and efficient on a variety of realistic traffic scenarios, and scales to a reasonable number of agents.

OCJun 30, 2021
Convex Optimization for Parameter Synthesis in MDPs

Murat Cubuktepe, Nils Jansen, Sebastian Junges et al.

Probabilistic model checking aims to prove whether a Markov decision process (MDP) satisfies a temporal logic specification. The underlying methods rely on an often unrealistic assumption that the MDP is precisely known. Consequently, parametric MDPs (pMDPs) extend MDPs with transition probabilities that are functions over unspecified parameters. The parameter synthesis problem is to compute an instantiation of these unspecified parameters such that the resulting MDP satisfies the temporal logic specification. We formulate the parameter synthesis problem as a quadratically constrained quadratic program (QCQP), which is nonconvex and is NP-hard to solve in general. We develop two approaches that iteratively obtain locally optimal solutions. The first approach exploits the so-called convex-concave procedure (CCP), and the second approach utilizes a sequential convex programming (SCP) method. The techniques improve the runtime and scalability by multiple orders of magnitude compared to black-box CCP and SCP by merging ideas from convex optimization and probabilistic model checking. We demonstrate the approaches on a satellite collision avoidance problem with hundreds of thousands of states and tens of thousands of parameters and their scalability on a wide range of commonly used benchmarks.

LOMay 26, 2021
Runtime Monitoring for Markov Decision Processes

Sebastian Junges, Hazem Torfah, Sanjit A. Seshia

We investigate the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics. In such systems, every state may be associated with a risk, e.g., the probability of an imminent crash. During runtime, we obtain partial information about the system state in form of observations. The monitor uses this information to estimate the risk of the (unobservable) current system state. Our results are threefold. First, we show that extensions of state estimation approaches do not scale due the combination of nondeterminism and probabilities. While convex hull algorithms improve the practical runtime, they do not prevent an exponential memory blowup. Second, we present a tractable algorithm based on model checking conditional reachability probabilities. Third, we provide prototypical implementations and manifest the applicability of our algorithms to a range of benchmarks. The results highlight the possibilities and boundaries of our novel algorithms.

ROMar 9, 2021
Entropy-Guided Control Improvisation

Marcell Vazquez-Chanlatte, Sebastian Junges, Daniel J. Fremont et al.

High level declarative constraints provide a powerful (and popular) way to define and construct control policies; however, most synthesis algorithms do not support specifying the degree of randomness (unpredictability) of the resulting controller. In many contexts, e.g., patrolling, testing, behavior prediction,and planning on idealized models, predictable or biased controllers are undesirable. To address these concerns, we introduce the \emph{Entropic Reactive Control Improvisation} (ERCI) framework and algorithm which supports synthesizing control policies for stochastic games that are declaratively specified by (i) a \emph{hard constraint} specifying what must occur, (ii) a \emph{soft constraint} specifying what typically occurs, and (iii) a \emph{randomization constraint} specifying the unpredictability and variety of the controller, as quantified using causal entropy. This framework, extends the state of the art by supporting arbitrary combinations of adversarial and probabilistic uncertainty in the environment. ERCI enables a flexible modeling formalism which we argue, theoretically and empirically, remains tractable.

LOJan 29, 2021
Inductive Synthesis for Probabilistic Programs Reaches New Horizons

Roman Andriushchenko, Milan Ceska, Sebastian Junges et al.

This paper presents a novel method for the automated synthesis of probabilistic programs. The starting point is a program sketch representing a finite family of finite-state Markov chains with related but distinct topologies, and a PCTL specification. The method builds on a novel inductive oracle that greedily generates counter-examples (CEs) for violating programs and uses them to prune the family. These CEs leverage the semantics of the family in the form of bounds on its best- and worst-case behaviour provided by a deductive oracle using an MDP abstraction. The method further monitors the performance of the synthesis and adaptively switches between the inductive and deductive reasoning. Our experiments demonstrate that the novel CE construction provides a significantly faster and more effective pruning strategy leading to acceleration of the synthesis process on a wide range of benchmarks. For challenging problems, such as the synthesis of decentralized partially-observable controllers, we reduce the run-time from a day to minutes.

AISep 24, 2020
Robust Finite-State Controllers for Uncertain POMDPs

Murat Cubuktepe, Nils Jansen, Sebastian Junges et al.

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captures uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

AIJun 30, 2020
Verification of indefinite-horizon POMDPs

Alexander Bork, Sebastian Junges, Joost-Pieter Katoen et al.

The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it considers may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach. Our experiments show that this framework significantly improves the scalability of the approach.

AIJun 30, 2020
Enforcing Almost-Sure Reachability in POMDPs

Sebastian Junges, Nils Jansen, Sanjit A. Seshia

Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

SEFeb 17, 2020
The Probabilistic Model Checker Storm

Christian Hensel, Sebastian Junges, Joost-Pieter Katoen et al.

We present the probabilistic model checker Storm. Storm supports the analysis of discrete- and continuous-time variants of both Markov chains and Markov decision processes. Storm has three major distinguishing features. It supports multiple input languages for Markov models, including the JANI and PRISM modeling languages, dynamic fault trees, generalized stochastic Petri nets, and the probabilistic guarded command language. It has a modular set-up in which solvers and symbolic engines can easily be exchanged. Its Python API allows for rapid prototyping by encapsulating Storm's fast and scalable algorithms. This paper reports on the main features of Storm and explains how to effectively use them. A description is provided of the main distinguishing functionalities of Storm. Finally, an empirical evaluation of different configurations of Storm on the QComp 2019 benchmark set is presented.

SEApr 28, 2019
Counterexample-Driven Synthesis for Probabilistic Program Sketches

Milan Češka, Christian Hensel, Sebastian Junges et al.

Probabilistic programs are key to deal with uncertainty in e.g. controller synthesis. They are typically small but intricate. Their development is complex and error prone requiring quantitative reasoning over a myriad of alternative designs. To mitigate this complexity, we adopt counterexample-guided inductive synthesis (CEGIS) to automatically synthesise finite-state probabilistic programs. Our approach leverages efficient model checking, modern SMT solving, and counterexample generation at program level. Experiments on practically relevant case studies show that design spaces with millions of candidate designs can be fully explored using a few thousand verification queries.

SEMar 13, 2019
Safety Analysis for Vehicle Guidance Systems with Dynamic Fault Trees

Majdi Ghadhab, Sebastian Junges, Joost-Pieter Katoen et al.

This paper considers the design-phase safety analysis of vehicle guidance systems. The proposed approach constructs dynamic fault trees (DFTs) to model a variety of safety concepts and E/E architectures for drive automation. The fault trees can be used to evaluate various quantitative measures by means of model checking. The approach is accompanied by a large-scale evaluation: The resulting DFTs with up to 300 elements constitute larger-than-before DFTs, yet the concepts and architectures can be evaluated in a matter of minutes.

LOFeb 15, 2019
Shepherding Hordes of Markov Chains

Milan Ceska, Nils Jansen, Sebastian Junges et al.

This paper considers large families of Markov chains (MCs) that are defined over a set of parameters with finite discrete domains. Such families occur in software product lines, planning under partial observability, and sketching of probabilistic programs. Simple questions, like `does at least one family member satisfy a property?', are NP-hard. We tackle two problems: distinguish family members that satisfy a given quantitative property from those that do not, and determine a family member that satisfies the property optimally, i.e., with the highest probability or reward. We show that combining two well-known techniques, MDP model checking and abstraction refinement, mitigates the computational complexity. Experiments on a broad set of benchmarks show that in many situations, our approach is able to handle families of millions of MCs, providing superior scalability compared to existing solutions.

AISep 28, 2018
The Partially Observable Games We Play for Cyber Deception

Mohamadreza Ahmadi, Murat Cubuktepe, Nils Jansen et al.

Progressively intricate cyber infiltration mechanisms have made conventional means of defense, such as firewalls and malware detectors, incompetent. These sophisticated infiltration mechanisms can study the defender's behavior, identify security caveats, and modify their actions adaptively. To tackle these security challenges, cyber-infrastructures require active defense techniques that incorporate cyber deception, in which the defender (deceiver) implements a strategy to mislead the infiltrator. To this end, we use a two-player partially observable stochastic game (POSG) framework, wherein the deceiver has full observability over the states of the POSG, and the infiltrator has partial observability. Then, the deception problem is to compute a strategy for the deceiver that minimizes the expected cost of deception against all strategies of the infiltrator. We first show that the underlying problem is a robust mixed-integer linear program, which is intractable to solve in general. Towards a scalable approach, we compute optimal finite-memory strategies for the infiltrator by a reduction to a series of synthesis problems for parametric Markov decision processes. We use these infiltration strategies to find robust strategies for the deceiver using mixed-integer linear programming. We illustrate the performance of our technique on a POSG model for network security. Our experiments demonstrate that the proposed approach handles scenarios considerably larger than those of the state-of-the-art methods.

AIJul 16, 2018
Safe Reinforcement Learning via Probabilistic Shields

Nils Jansen, Bettina Könighofer, Sebastian Junges et al.

This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables decision-making to adhere to safety constraints with high probability. In a separation of concerns, we employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. We use these results to realize a shield that is applied to an RL algorithm which then optimizes the actual performance objective. We discuss tradeoffs between sufficient progress in exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

SEMar 14, 2018
One Net Fits All: A unifying semantics of Dynamic Fault Trees using GSPNs

Sebastian Junges, Joost-Pieter Katoen, Marielle Stoelinga et al.

Dynamic Fault Trees (DFTs) are a prominent model in reliability engineering. They are strictly more expressive than static fault trees, but this comes at a price: their interpretation is non-trivial and leaves quite some freedom. This paper presents a GSPN semantics for DFTs. This semantics is rather simple and compositional. The key feature is that this GSPN semantics unifies all existing DFT semantics from the literature. All semantic variants can be obtained by choosing appropriate priorities and treatment of non-determinism.

ROAug 14, 2017
Strategy Synthesis in POMDPs via Game-Based Abstractions

Leonore Winterer, Sebastian Junges, Ralf Wimmer et al.

We study synthesis problems with constraints in partially observable Markov decision processes (POMDPs), where the objective is to compute a strategy for an agent that is guaranteed to satisfy certain safety and performance specifications. Verification and strategy synthesis for POMDPs are, however, computationally intractable in general. We alleviate this difficulty by focusing on planning applications and exploiting typical structural properties of such scenarios; for instance, we assume that the agent has the ability to observe its own position inside an environment. We propose an abstraction refinement framework which turns such a POMDP model into a (fully observable) probabilistic two-player game (PG). For the obtained PGs, efficient verification and synthesis tools allow to determine strategies with optimal safety and performance measures, which approximate optimal schedulers on the POMDP. If the approximation is too coarse to satisfy the given specifications, an refinement scheme improves the computed strategies. As a running example, we use planning problems where an agent moves inside an environment with randomly moving obstacles and restricted observability. We demonstrate that the proposed method advances the state of the art by solving problems several orders-of-magnitude larger than those that can be handled by existing POMDP solvers. Furthermore, this method gives guarantees on safety constraints, which is not supported by the majority of the existing solvers.

SEFeb 14, 2017
A storm is Coming: A Modern Probabilistic Model Checker

Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen et al.

We launch the new probabilistic model checker storm. It features the analysis of discrete- and continuous-time variants of both Markov chains and MDPs. It supports the PRISM and JANI modeling languages, probabilistic programs, dynamic fault trees and generalized stochastic Petri nets. It has a modular set-up in which solvers and symbolic engines can easily be exchanged. It offers a Python API for rapid prototyping by encapsulating storm's fast and scalable algorithms. Experiments on a variety of benchmarks show its competitive performance.

AIOct 28, 2016
Probabilistic Model Checking for Complex Cognitive Tasks -- A case study in human-robot interaction

Sebastian Junges, Nils Jansen, Joost-Pieter Katoen et al.

This paper proposes to use probabilistic model checking to synthesize optimal robot policies in multi-tasking autonomous systems that are subject to human-robot interaction. Given the convincing empirical evidence that human behavior can be related to reinforcement models, we take as input a well-studied Q-table model of the human behavior for flexible scenarios. We first describe an automated procedure to distill a Markov decision process (MDP) for the human in an arbitrary but fixed scenario. The distinctive issue is that -- in contrast to existing models -- under-specification of the human behavior is included. Probabilistic model checking is used to predict the human's behavior. Finally, the MDP model is extended with a robot model. Optimal robot policies are synthesized by analyzing the resulting two-player stochastic game. Experimental results with a prototypical implementation using PRISM show promising results.

SEOct 27, 2016
The Probabilistic Model Checker Storm (Extended Abstract)

Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen et al.

We present a new probabilistic model checker Storm. Using state-of-the-art libraries, we aim for both high performance and versatility. This extended abstract gives a brief overview of the features of Storm.

SEApr 25, 2016
Advancing Dynamic Fault Tree Analysis

Matthias Volk, Sebastian Junges, Joost-Pieter Katoen

This paper presents a new state space generation approach for dynamic fault trees (DFTs) together with a technique to synthesise failures rates in DFTs. Our state space generation technique aggressively exploits the DFT structure --- detecting symmetries, spurious non-determinism, and don't cares. Benchmarks show a gain of more than two orders of magnitude in terms of state space generation and analysis time. Our approach supports DFTs with symbolic failure rates and is complemented by parameter synthesis. This enables determining the maximal tolerable failure rate of a system component while ensuring that the mean time of failure stays below a threshold.

SEOct 20, 2015
Safety-Constrained Reinforcement Learning for MDPs

Sebastian Junges, Nils Jansen, Christian Dehnert et al.

We consider controller synthesis for stochastic and partially unknown environments in which safety is essential. Specifically, we abstract the problem as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space. Standard learning approaches synthesize cost-optimal strategies without guaranteeing safety properties. To remedy this, we first compute safe, permissive strategies. Then, exploration is constrained to these strategies and thereby meets the imposed safety requirements. Exploiting an iterative learning procedure, the resulting policy is safety-constrained and optimal. We show correctness and completeness of the method and discuss the use of several heuristics to increase its scalability. Finally, we demonstrate the applicability by means of a prototype implementation.