Thom Badings

AI
h-index12
13papers
152citations
Novelty45%
AI Score50

13 Papers

AIMay 29
Robust Shielding for Safe Reinforcement Learning

Edwin Hamel-De le Court, Thom Badings, Alessandro Abate et al.

Shielding is an effective approach to formally guarantee the safety of reinforcement learning agents in Markov decision processes (MDPs). However, existing shielding techniques typically assume knowledge of the safety-relevant transition dynamics - a requirement that is seldom met in practice. To address this limitation, we introduce a novel shielding framework for robust MDPs (RMDPs), i.e., MDPs with sets of transition probabilities. We define safety as the satisfaction of a linear temporal logic (LTL) formula with a certain threshold probability under the worst-case transition probabilities of the RMDP. We prove that our shielding framework is both sound and optimal for the RMDP: every policy admissible by the shield is safe, and conversely, every safe RMDP policy is admissible by the shield. We combine our approach with existing sampling methods for learning transition probabilities of MDPs with probably approximately correct (PAC) guarantees. This combination enables the construction of shields for MDPs that, with high confidence, guarantee safety while remaining minimally restrictive. Our experiments show that our shields for learned RMDPs guarantee safety in unknown MDPs while recovering strong expected return as the number of samples increases.

SYJan 4, 2023
Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal Abstractions

Thom Badings, Licio Romao, Alessandro Abate et al.

Controllers for dynamical systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise in a dynamical system, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target, while also avoiding unsafe regions of the state space. First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is, with a user-specified confidence probability, robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP and compute a controller for which these guarantees carry over to the original control system. In addition, we develop a tailored computational scheme that reduces the complexity of the synthesis of these guarantees on the iMDP. Benchmarks on realistic control systems show the practical applicability of our method, even when the iMDP has hundreds of millions of transitions.

SYOct 12, 2022
Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty

Thom Badings, Licio Romao, Alessandro Abate et al.

Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Stochastic noise causes aleatoric uncertainty, whereas imprecise knowledge of model parameters leads to epistemic uncertainty. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. However, the underlying models exclusively capture aleatoric but not epistemic uncertainty, and thus require that model parameters are known precisely. Our contribution to overcoming this restriction is a novel abstraction-based controller synthesis method for continuous-state models with stochastic noise and uncertain parameters. By sampling techniques and robust analysis, we capture both aleatoric and epistemic uncertainty, with a user-specified confidence level, in the transition probability intervals of a so-called interval Markov decision process (iMDP). We synthesize an optimal policy on this iMDP, which translates (with the specified confidence level) to a feedback controller for the continuous model with the same performance guarantees. Our experimental benchmarks confirm that accounting for epistemic uncertainty leads to controllers that are more robust against variations in parameter values.

AIMar 10, 2023
Decision-Making Under Uncertainty: Beyond Probabilities

Thom Badings, Thiago D. Simão, Marnix Suilen et al.

This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion.

SYDec 1, 2022
Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics

Luke Rickard, Thom Badings, Licio Romao et al.

Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deployment in safety-critical scenarios. However, hybrid features and stochastic or unknown behaviours make this problem challenging. We propose a method for synthesising controllers for Markov jump linear systems (MJLSs), a class of discrete-time models for cyber-physical systems, so that they certifiably satisfy probabilistic computation tree logic (PCTL) formulae. An MJLS consists of a finite set of stochastic linear dynamics and discrete jumps between these dynamics that are governed by a Markov decision process (MDP). We consider the cases where the transition probabilities of this MDP are either known up to an interval or completely unknown. Our approach is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS. We formalise this abstraction as an interval MDP (iMDP) for which we compute intervals of transition probabilities using sampling techniques from the so-called 'scenario approach', resulting in a probabilistically sound approximation. We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.

SYApr 1
Temporal Logic Control of Nonlinear Stochastic Systems with Online Performance Optimization

Alessandro Riccardi, Thom Badings, Luca Laurenti et al.

The deployment of autonomous systems in safety-critical environments requires control policies that guarantee satisfaction of complex control specifications. These systems are commonly modeled as nonlinear discrete-time stochastic systems. A~popular approach to computing a policy that provably satisfies a complex control specification is to construct a finite-state abstraction, often represented as a Markov decision process (MDP) with intervals of transition probabilities, i.e., an interval MDP (IMDP). However, existing abstraction techniques compute a \emph{single policy}, thus leaving no room for online cost or performance optimization, e.g., of energy consumption. To overcome this limitation, we propose a novel IMDP abstraction technique that yields a \emph{set of policies}, each of which satisfies the control specification with a certain minimum probability. We can thus use any online control algorithm to search through this set of verified policies while retaining the guaranteed satisfaction probability of the entire policy set. In particular, we employ model predictive control (MPC) to minimize a desired cost function that is independent of the control specification considered in the abstraction. Our experiments demonstrate that our approach yields better control performance than state-of-the-art single-policy abstraction techniques, with a small degradation of the guarantees.

SYNov 16, 2023
Correct-by-Construction Control for Stochastic and Uncertain Dynamical Models via Formal Abstractions

Thom Badings, Nils Jansen, Licio Romao et al.

Automated synthesis of correct-by-construction controllers for autonomous systems is crucial for their deployment in safety-critical scenarios. Such autonomous systems are naturally modeled as stochastic dynamical models. The general problem is to compute a controller that provably satisfies a given task, represented as a probabilistic temporal logic specification. However, factors such as stochastic uncertainty, imprecisely known parameters, and hybrid features make this problem challenging. We have developed an abstraction framework that can be used to solve this problem under various modeling assumptions. Our approach is based on a robust finite-state abstraction of the stochastic dynamical model in the form of a Markov decision process with intervals of probabilities (iMDP). We use state-of-the-art verification techniques to compute an optimal policy on the iMDP with guarantees for satisfying the given specification. We then show that, by construction, we can refine this policy into a feedback controller for which these guarantees carry over to the dynamical model. In this short paper, we survey our recent research in this area and highlight two challenges (related to scalability and dealing with nonlinear dynamics) that we aim to address with our ongoing research.

LGNov 9, 2025
Scalable Verification of Neural Control Barrier Functions Using Linear Bound Propagation

Nikolaus Vertovec, Frederik Baymler Mathiesen, Thom Badings et al.

Control barrier functions (CBFs) are a popular tool for safety certification of nonlinear dynamical control systems. Recently, CBFs represented as neural networks have shown great promise due to their expressiveness and applicability to a broad class of dynamics and safety constraints. However, verifying that a trained neural network is indeed a valid CBF is a computational bottleneck that limits the size of the networks that can be used. To overcome this limitation, we present a novel framework for verifying neural CBFs based on piecewise linear upper and lower bounds on the conditions required for a neural network to be a CBF. Our approach is rooted in linear bound propagation (LBP) for neural networks, which we extend to compute bounds on the gradients of the network. Combined with McCormick relaxation, we derive linear upper and lower bounds on the CBF conditions, thereby eliminating the need for computationally expensive verification procedures. Our approach applies to arbitrary control-affine systems and a broad range of nonlinear activation functions. To reduce conservatism, we develop a parallelizable refinement strategy that adaptively refines the regions over which these bounds are computed. Our approach scales to larger neural networks than state-of-the-art verification procedures for CBFs, as demonstrated by our numerical experiments.

AINov 18, 2024
Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet

Marnix Suilen, Thom Badings, Eline M. Bovy et al.

Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.

AIAug 11, 2025
Best-Effort Policies for Robust Markov Decision Processes

Alessandro Abate, Thom Badings, Giuseppe De Giacomo et al.

We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the worst-case expected return, we additionally require the policy to achieve a maximal expected return under different (i.e., not fully adversarial) transition probabilities. We call such a policy an optimal robust best-effort (ORBE) policy. We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a manageable overhead compared to standard robust value iteration. ORBE policies offer a principled tie-breaker among optimal robust policies. Numerical experiments show the feasibility of our approach.

LGJun 2, 2024
Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates

Thom Badings, Wietze Koops, Sebastian Junges et al.

We consider the verification of neural network policies for discrete-time stochastic systems with respect to reach-avoid specifications. We use a learner-verifier procedure that learns a certificate for the specification, represented as a neural network. Verifying that this neural network certificate is a so-called reach-avoid supermartingale (RASM) proves the satisfaction of a reach-avoid specification. Existing approaches for such a verification task rely on computed Lipschitz constants of neural networks. These approaches struggle with large Lipschitz constants, especially for reach-avoid specifications with high threshold probabilities. We present two key contributions to obtain smaller Lipschitz constants than existing approaches. First, we introduce logarithmic RASMs (logRASMs), which take exponentially smaller values than RASMs and hence have lower theoretical Lipschitz constants. Second, we present a fast method to compute tighter upper bounds on Lipschitz constants based on weighted norms. Our empirical evaluation shows we can consistently verify the satisfaction of reach-avoid specifications with probabilities as high as 99.9999%.

LGMay 1, 2023
Efficient Sensitivity Analysis for Parametric Robust Markov Chains

Thom Badings, Sebastian Junges, Ahmadreza Marandi et al.

We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of $k$ parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis.

AIMar 3, 2021
Correct-by-construction reach-avoid control of partially observable linear stochastic systems

Thom Badings, Hasan A. Poonawala, Marielle Stoelinga et al.

We study feedback controller synthesis for reach-avoid control of discrete-time, linear time-invariant (LTI) systems with Gaussian process and measurement noise. The problem is to compute a controller such that, with at least some required probability, the system reaches a desired goal state in finite time while avoiding unsafe states. Due to stochasticity and nonconvexity, this problem does not admit exact algorithmic or closed-form solutions in general. Our key contribution is a correct-by-construction controller synthesis scheme based on a finite-state abstraction of a Gaussian belief over the unmeasured state, obtained using a Kalman filter. We formalize this abstraction as a Markov decision process (MDP). To be robust against numerical imprecision in approximating transition probabilities, we use MDPs with intervals of transition probabilities. By construction, any policy on the abstraction can be refined into a piecewise linear feedback controller for the LTI system. We prove that the closed-loop LTI system under this controller satisfies the reach-avoid problem with at least the required probability. The numerical experiments show that our method is able to solve reach-avoid problems for systems with up to 6D state spaces, and with control input constraints that cannot be handled by methods such as the rapidly-exploring random belief trees (RRBT).