Roderick Bloem

h-index45

17papers

1,173citations

Novelty48%

AI Score43

Ranked #53,963 of 194,257 authors (top 28%)#54 in LO (top 21%)

17 Papers

9.6LGDec 4, 2022Code

Automata Learning meets Shielding

Martin Tappler, Stefan Pranger, Bettina Könighofer et al.

Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next $k$ steps. After the shield is constructed, the shield is used during runtime and blocks any actions that induce a too large risk from the agent. The shielded agent continues to explore the environment and collects new data on the environment. Iteratively, we use the collected data to learn new MDPs with higher accuracy, resulting in turn in shields able to prevent more safety violations. We implemented our approach and present a detailed case study of a Q-learning agent exploring slippery Gridworlds. In our experiments, we show that as the agent explores more and more of the environment during training, the improved learned models lead to shields that are able to prevent many safety violations.

16.5LGDec 4, 2022

Online Shielding for Reinforcement Learning

Bettina Könighofer, Julian Rudolf, Alexander Palmisano et al.

Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given and many safety violations can be avoided by planning ahead a short time into the future. We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. For any action, the shield computes the maximal probability to not violate the safety specification within the next $k$ steps when executing this action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent. Existing offline shielding approaches compute exhaustively the safety of all state-action combinations ahead of time, resulting in huge computation times and large memory consumption. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our approach is well suited for high-level planning problems where the time between decisions can be used for safety computations and it is sustainable for the agent to wait until these computations are finished. For our evaluation, we selected a 2-player version of the classical computer game SNAKE. The game represents a high-level planning problem that requires fast decisions and the multiplayer setting induces a large state space, which is computationally expensive to analyse exhaustively.

13.1AIAug 30, 2022

Correct-by-Construction Runtime Enforcement in AI -- A Survey

Bettina Könighofer, Roderick Bloem, Rüdiger Ehlers et al.

Runtime enforcement refers to the theories, techniques, and tools for enforcing correct behavior with respect to a formal specification of systems at runtime. In this paper, we are interested in techniques for constructing runtime enforcers for the concrete application domain of enforcing safety in AI. We discuss how safety is traditionally handled in the field of AI and how more formal guarantees on the safety of a self-learning agent can be given by integrating a runtime enforcer. We survey a selection of work on such enforcers, where we distinguish between approaches for discrete and continuous action spaces. The purpose of this paper is to foster a better understanding of advantages and limitations of different enforcement techniques, focusing on the specific challenges that arise due to their application in AI. Finally, we present some open challenges and avenues for future work.

5.4AIJul 5, 2023Code

Safety Shielding under Delayed Observation

Filip Cano Córdoba, Alexander Palmisano, Martin Fränzle et al.

Agents operating in physical environments need to be able to handle delays in the input and output signals since neither data transmission nor sensing or actuating the environment are instantaneous. Shields are correct-by-construction runtime enforcers that guarantee safe execution by correcting any action that may cause a violation of a formal safety specification. Besides providing safety guarantees, shields should interfere minimally with the agent. Therefore, shields should pick the safe corrective actions in such a way that future interferences are most likely minimized. Current shielding approaches do not consider possible delays in the input signals in their safety analyses. In this paper, we address this issue. We propose synthesis algorithms to compute \emph{delay-resilient shields} that guarantee safety under worst-case assumptions on the delays of the input signals. We also introduce novel heuristics for deciding between multiple corrective actions, designed to minimize future shield interferences caused by delays. As a further contribution, we present the first integration of shields in a realistic driving simulator. We implemented our delayed shields in the driving simulator \textsc{Carla}. We shield potentially unsafe autonomous driving agents in different safety-critical scenarios and show the effect of delays on the safety analysis.

7.7CRMar 20

Sharing The Secret: Distributed Privacy-Preserving Monitoring

Mahyar Karimi, K. S. Thejaswini, Roderick Bloem et al.

In traditional runtime verification, a system is typically observed by a monolithic monitor. Enforcing privacy in such settings is computationally expensive, as it necessitates heavy cryptographic primitives. Therefore, privacy-preserving monitoring remains impractical for real-time applications. In this work, we address this scalability challenge by distributing the monitor across multiple parties -- at least one of which is honest. This architecture enables the use of efficient secret-sharing schemes instead of computationally intensive cryptography, dramatically reducing over-head while maintaining strong privacy guarantees. While existing secret-sharing approaches are typically limited to one-shot executions which do not maintain an internal state, we introduce a protocol tailored for continuous monitoring that supports repeated evaluations over an evolving internal state (kept secret from the system and the monitoring entities). We implement our approach using the MP-SPDZ framework. Our experiments demonstrate that, under these architectural assumptions, our protocol is significantly more scalable than existing alternatives.

3.8CRJul 5, 2021

Proving SIFA Protection of Masked Redundant Circuits

Vedad Hadzic, Robert Primas, Roderick Bloem

Implementation attacks like side-channel and fault attacks pose a considerable threat to cryptographic devices that are physically accessible by an attacker. As a consequence, devices like smart cards implement corresponding countermeasures like redundant computation and masking. Recently, statistically ineffective fault attacks (SIFA) were shown to be able to circumvent these classical countermeasure techniques. We present a new approach for verifying the SIFA protection of arbitrary masked implementations in both hardware and software. The proposed method uses Boolean dependency analysis, factorization, and known properties of masked computations to show whether the fault detection mechanism of redundant masked circuits can leak information about the processed secret values. We implemented this new method in a tool called Danira, which can show the SIFA resistance of cryptographic implementations like AES S-Boxes within minutes.

1.2FLNov 15, 2020

Safety Synthesis Sans Specification

Roderick Bloem, Hana Chockler, Masoud Ebrahimi et al.

We define the problem of learning a transducer ${S}$ from a target language $U$ containing possibly conflicting transducers, using membership queries and conjecture queries. The requirement is that the language of ${S}$ be a subset of $U$. We argue that this is a natural question in many situations in hardware and software verification. We devise a learning algorithm for this problem and show that its time and query complexity is polynomial with respect to the rank of the target language, its incompatibility measure, and the maximal length of a given counterexample. We report on experiments conducted with a prototype implementation.

3.0SEOct 13, 2020

Adaptive Testing for Specification Coverage

Ezio Bartocci, Roderick Bloem, Benedikt Maderbacher et al.

Ensuring correctness of cyber-physical systems (CPS) is an extremely challenging task that is in practice often addressed with simulation based testing. Formal specification languages, such as Signal Temporal Logic (STL), are used to mathematically express CPS requirements and thus render the simulation activity more systematic and principled. We propose a novel method for adaptive generation of tests with specification coverage for STL. To achieve this goal, we devise cooperative reachability games that we combine with numerical optimization to create tests that explore the system in a way that exercise various parts of the specification. To the best of our knowledge our approach is the first adaptive testing approach that can be applied directly to MATLAB\texttrademark\; Simulink/Stateflow models. We implemented our approach in a prototype tool and evaluated it on several illustrating examples and a case study from the avionics domain, demonstrating the effectiveness of adaptive testing to (1) incrementally build a test case that reaches a test objective, (2) generate a test suite that increases the specification coverage, and (3) infer what part of the specification is actually implemented.

2.7LGJul 10, 2019

Learning a Behavior Model of Hybrid Systems Through Combining Model-Based Testing and Machine Learning (Full Version)

Bernhard K. Aichernig, Roderick Bloem, Masoud Ebrahimi et al.

Models play an essential role in the design process of cyber-physical systems. They form the basis for simulation and analysis and help in identifying design problems as early as possible. However, the construction of models that comprise physical and digital behavior is challenging. Therefore, there is considerable interest in learning such hybrid behavior by means of machine learning which requires sufficient and representative training data covering the behavior of the physical system adequately. In this work, we exploit a combination of automata learning and model-based testing to generate sufficient training data fully automatically. Experimental results on a platooning scenario show that recurrent neural networks learned with this data achieved significantly better results compared to models learned from randomly generated data. In particular, the classification error for crash detection is reduced by a factor of five and a similar F1-score is obtained with up to three orders of magnitude fewer training samples.

17.9SEApr 15, 2019

Model-Based Testing IoT Communication via Active Automata Learning

Martin Tappler, Bernhard K. Aichernig, Roderick Bloem

This paper presents a learning-based approach to detecting failures in reactive systems. The technique is based on inferring models of multiple implementations of a common specification which are pair-wise cross-checked for equivalence. Any counterexample to equivalence is flagged as suspicious and has to be analysed manually. Hence, it is possible to find possible failures in a semi-automatic way without prior modelling. We show that the approach is effective by means of a case study. For this case study, we carried out experiments in which we learned models of five implementations of MQTT brokers/servers, a protocol used in the Internet of Things. Examining these models, we found several violations of the MQTT specification. All but one of the considered implementations showed faulty behaviour. In the analysis, we discuss effectiveness and also issues we faced.

4.9SESep 5, 2018

Synthesizing Adaptive Test Strategies from Temporal Logic Specifications

Roderick Bloem, Goerschwin Fey, Fabian Greif et al.

Constructing good test cases is difficult and time-consuming, especially if the system under test is still under development and its exact behavior is not yet fixed. We propose a new approach to compute test strategies for reactive systems from a given temporal logic specification using formal methods. The computed strategies are guaranteed to reveal certain simple faults in every realization of the specification and for every behavior of the uncontrollable part of the system's environment. The proposed approach supports different assumptions on occurrences of faults (ranging from a single transient fault to a persistent fault) and by default aims at unveiling the weakest one. Based on well-established hypotheses from fault-based testing, we argue that such tests are also sensitive for more complex bugs. Since the specification may not define the system behavior completely, we use reactive synthesis algorithms with partial information. The computed strategies are adaptive test strategies that react to behavior at runtime. We work out the underlying theory of adaptive test strategy synthesis and present experiments for a safety-critical component of a real-world satellite system. We demonstrate that our approach can be applied to industrial specifications and that the synthesized test strategies are capable of detecting bugs that are hard to detect with random testing.

20.0AIJul 16, 2018

Safe Reinforcement Learning via Probabilistic Shields

Nils Jansen, Bettina Könighofer, Sebastian Junges et al.

This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables decision-making to adhere to safety constraints with high probability. In a separation of concerns, we employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. We use these results to realize a shield that is applied to an RL algorithm which then optimizes the actual performance objective. We discuss tradeoffs between sufficient progress in exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

43.1LOAug 29, 2017

Safe Reinforcement Learning via Shielding

Mohammed Alshiekh, Roderick Bloem, Ruediger Ehlers et al.

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield is introduced in the traditional learning process in two alternative ways, depending on the location at which the shield is implemented. In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions. In the second way, the shield is introduced after the learning agent. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.

1.2LONov 4, 2016

QBF Solving by Counterexample-guided Expansion

Roderick Bloem, Nicolas Braud-Santoni, Vedad Hadzic

We introduce a novel generalization of Counterexample-Guided Inductive Synthesis (CEGIS) and instantiate it to yield a novel, competitive algorithm for solving Quantified Boolean Formulas (QBF). Current QBF solvers based on counterexample-guided expansion use a recursive approach which scales poorly with the number of quantifier alternations. Our generalization of CEGIS removes the need for this recursive approach, and we instantiate it to yield a simple and efficient algorithm for QBF solving. Lastly, this research is supported by a competitive, though straightforward, implementation of the algorithm, making it possible to study the practical impact of our algorithm design decisions, along with various optimizations.

1.2LOSep 16, 2014

Automatic Error Localization for Software using Deductive Verification

Robert Koenighofer, Ronald Toegl, Roderick Bloem

Even competent programmers make mistakes. Automatic verification can detect errors, but leaves the frustrating task of finding the erroneous line of code to the user. This paper presents an automatic approach for identifying potential error locations in software. It is based on a deductive verification engine, which detects errors in functions annotated with pre- and post-conditions. Using an automatic theorem prover, our approach finds expressions in the code that can be modified such that the program satisfies its specification. Scalability is achieved by analyzing each function in isolation. We have implemented our approach in the widely used Frama-C framework and present first experimental results. This is an extended version of [8], featuring an additional appendix.

1.2LOJul 21, 2014

Parameterized Synthesis Case Study: AMBA AHB

Roderick Bloem, Swen Jacobs, Ayrat Khalimov

We revisit the AMBA AHB case study that has been used as a benchmark for several reactive synthesis tools. Synthesizing AMBA AHB implementations that can serve a large number of masters is still a difficult problem. We demonstrate how to use parameterized synthesis in token rings to obtain an implementation for a component that serves a single master, and can be arranged in a ring of arbitrarily many components. We describe new tricks - property decompositional synthesis, and direct encoding of simple GR(1) - that together with previously described optimizations allowed us to synthesize a component model with 14 states in about 1 hour.

11.2SEJun 30, 2014

Parameterized Synthesis Case Study: AMBA AHB (extended version)

Roderick Bloem, Swen Jacobs, Ayrat Khalimov

We revisit the AMBA AHB case study that has been used as a benchmark for several reactive syn- thesis tools. Synthesizing AMBA AHB implementations that can serve a large number of masters is still a difficult problem. We demonstrate how to use parameterized synthesis in token rings to obtain an implementation for a component that serves a single master, and can be arranged in a ring of arbitrarily many components. We describe new tricks -- property decompositional synthesis, and direct encoding of simple GR(1) -- that together with previously described optimizations allowed us to synthesize the model with 14 states in 30 minutes.