Sadegh Soudjani

SY
h-index60
31papers
327citations
Novelty52%
AI Score56

31 Papers

SYJun 3
Multi-Agent Temporal Logic Planning via Penalty Functions and Block-Coordinate Optimization

Eleftherios E. Vlahakis, Arash Bahari Kordabad, Lars Lindemann et al.

Multi-agent planning under Signal Temporal Logic (STL) is often hindered by collaborative tasks that lead to computational challenges due to the inherent high dimensionality of the problem, preventing scalable synthesis with satisfaction guarantees. To address this, we formulate STL planning as an optimization program under multi-agent STL constraints and introduce a penalty-based unconstrained relaxation that can be efficiently solved via a Block-Coordinate Gradient Descent (BCGD) method, where each block corresponds to a single agent's decision variables, thereby mitigating complexity. By utilizing a quadratic penalty function defined via smooth STL semantics, we show that BCGD iterations converge to a stationary point of the penalized problem under standard regularity assumptions. To enforce feasibility, the BCGD solver is embedded within a two-layer optimization scheme: inner BCGD updates are performed for a fixed penalty parameter, which is then increased in an outer loop to progressively improve multi-agent STL robustness. The proposed framework enables scalable computations and is validated through various complex multi-robot planning scenarios.

SYJun 29, 2018
Temporal Logic Verification of Stochastic Systems Using Barrier Certificates

Pushpak Jagtap, Sadegh Soudjani, Majid Zamani

This paper presents a methodology for temporal logic verification of discrete-time stochastic systems. Our goal is to find a lower bound on the probability that a complex temporal property is satisfied by finite traces of the system. Desired temporal properties of the system are expressed using a fragment of linear temporal logic, called safe LTL over finite traces. We propose to use barrier certificates for computations of such lower bounds, which is computationally much more efficient than the existing discretization-based approaches. The new approach is discretization-free and does not suffer from the curse of dimensionality caused by discretizing state sets. The proposed approach relies on decomposing the negation of the specification into a union of sequential reachabilities and then using barrier certificates to compute upper bounds for these reachability probabilities. We demonstrate the effectiveness of the proposed approach on case studies with linear and polynomial dynamics.

SYFeb 14, 2020
Compositional (In)Finite Abstractions for Large-Scale Interconnected Stochastic Systems

Abolfazl Lavaei, Sadegh Soudjani, Majid Zamani

This paper is concerned with a compositional approach for constructing both infinite (reduced-order models) and finite abstractions (a.k.a. finite Markov decision processes (MDPs)) of large-scale interconnected discrete-time stochastic systems. The proposed framework is based on the notion of stochastic simulation functions enabling us to employ an abstract system as a substitution of the original one in the controller design process with guaranteed error bounds. In the first part of the paper, we derive sufficient small-gain type conditions for the compositional quantification of the probabilistic distance between the interconnection of stochastic control subsystems and that of their infinite abstractions. We then construct infinite abstractions together with their corresponding stochastic simulation functions for a particular class of discrete-time nonlinear stochastic control systems. In the second part of the paper, we leverage small-gain type conditions for the compositional construction of finite abstractions. We propose an approach to construct finite MDPs as finite abstractions of concrete models or their reduced-order versions satisfying an incremental input-to-state stability property. We demonstrate the effectiveness of the proposed results by applying our approaches to a fully interconnected network of 20 nonlinear subsystems (totally 100 dimensions). We construct finite MDPs from their reduced-order versions (together 20 dimensions) with guaranteed error bounds on their output trajectories. We also apply the proposed results to a temperature regulation in a circular building and construct compositionally a finite abstraction of a network containing 1000 rooms. We employ the constructed finite abstractions as substitutes to compositionally synthesize policies regulating the temperature in each room for a bounded time horizon.

SYNov 27, 2018
Temporal logic control of general Markov decision processes by approximate policy refinement

Sofie Haesaert, Sadegh Soudjani, Alessandro Abate

The formal verification and controller synthesis for Markov decision processes that evolve over uncountable state spaces are computationally hard and thus generally rely on the use of approximations. In this work, we consider the correct-by-design control of general Markov decision processes (gMDPs) with respect to temporal logic properties by leveraging approximate probabilistic relations between the original model and its abstraction. We newly work with a robust satisfaction for the construction and verification of control strategies, which allows for both deviations in the outputs of the gMDPs and in the probabilistic transitions. The computation is done over the reduced or abstracted models, such that when a property is robustly satisfied on the abstract model, it is also satisfied on the original model with respect to a refined control strategy.

SYDec 21, 2017
From Dissipativity Theory to Compositional Construction of Finite Markov Decision Processes

Abolfazl Lavaei, Sadegh Soudjani, Majid Zamani

This paper is concerned with a compositional approach for constructing finite Markov decision processes of interconnected discrete-time stochastic control systems. The proposed approach leverages the interconnection topology and a notion of so-called stochastic storage functions describing joint dissipativity-type properties of subsystems and their abstractions. In the first part of the paper, we derive dissipativity-type compositional conditions for quantifying the error between the interconnection of stochastic control subsystems and that of their abstractions. In the second part of the paper, we propose an approach to construct finite Markov decision processes together with their corresponding stochastic storage functions for classes of discrete-time control systems satisfying some incremental passivablity property. Under this property, one can construct finite Markov decision processes by a suitable discretization of the input and state sets. Moreover, we show that for linear stochastic control systems, the aforementioned property can be readily checked by some matrix inequality. We apply our proposed results to the temperature regulation in a circular building by constructing compositionally a finite Markov decision process of a network containing 200 rooms in which the compositionality condition does not require any constraint on the number or gains of the subsystems. We employ the constructed finite Markov decision process as a substitute to synthesize policies regulating the temperature in each room for a bounded time horizon.

SYJul 7, 2023
Neural Abstraction-Based Controller Synthesis and Deployment

Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani

Abstraction-based techniques are an attractive approach for synthesizing correct-by-construction controllers to satisfy high-level temporal requirements. A main bottleneck for successful application of these techniques is the memory requirement, both during controller synthesis and in controller deployment. We propose memory-efficient methods for mitigating the high memory demands of the abstraction-based techniques using neural network representations. To perform synthesis for reach-avoid specifications, we propose an on-the-fly algorithm that relies on compressed neural network representations of the forward and backward dynamics of the system. In contrast to usual applications of neural representations, our technique maintains soundness of the end-to-end process. To ensure this, we correct the output of the trained neural network such that the corrected output representations are sound with respect to the finite abstraction. For deployment, we provide a novel training algorithm to find a neural network representation of the synthesized controller and experimentally show that the controller can be correctly represented as a combination of a neural network and a look-up table that requires a substantially smaller memory. We demonstrate experimentally that our approach significantly reduces the memory requirements of abstraction-based methods. For the selected benchmarks, our approach reduces the memory requirements respectively for the synthesis and deployment by a factor of $1.31\times 10^5$ and $7.13\times 10^3$ on average, and up to $7.54\times 10^5$ and $3.18\times 10^4$. Although this reduction is at the cost of increased off-line computations to train the neural networks, all the steps of our approach are parallelizable and can be implemented on machines with higher number of processing units to reduce the required computational time.

LOJul 15, 2024
Data-Driven Abstractions via Binary-Tree Gaussian Processes for Formal Verification

Oliver Schön, Shammakh Naseer, Ben Wooding et al.

To advance formal verification of stochastic systems against temporal logic requirements for handling unknown dynamics, researchers have been designing data-driven approaches inspired by breakthroughs in the underlying machine learning techniques. As one promising research direction, abstraction-based solutions based on Gaussian process (GP) regression have become popular for their ability to learn a representation of the latent system from data with a quantified error. Results obtained based on this model are then translated to the true system via various methods. In a recent publication, GPs using a so-called binary-tree kernel have demonstrated a polynomial speedup w.r.t. the size of the data compared to their vanilla version, outcompeting all existing sparse GP approximations. Incidentally, the resulting binary-tree Gaussian process (BTGP) is characteristic for its piecewise-constant posterior mean and covariance functions, naturally abstracting the input space into discrete partitions. In this paper, we leverage this natural abstraction of the BTGP for formal verification, eliminating the need for cumbersome abstraction and error quantification procedures. We show that the BTGP allows us to construct an interval Markov chain model of the unknown system with a speedup that is polynomial w.r.t. the size of the abstraction compared to alternative approaches. We provide a delocalized error quantification via a unified formula even when the true dynamics do not live in the function space of the BTGP. This allows us to compute upper and lower bounds on the probability of satisfying reachability specifications that are robust to both aleatoric and epistemic uncertainties.

GTApr 13
Incremental Data-Driven Policy Synthesis via Game Abstractions

Irmak Sağlam, Mahdi Nazeri, Alessandro Abate et al.

We address the synthesis of control policies for unknown discrete-time stochastic dynamical systems to satisfy temporal logic objectives. We present a data-driven, abstraction-based control framework that integrates online learning with novel incremental game-solving. Under appropriate continuity assumptions, our method abstracts the system dynamics into a finite stochastic (2.5-player) game graph derived from data. Given a requirement over time on this graph, we compute the winning region -- i.e., the set of initial states from which the objective is satisfiable -- in the resulting game, together with a corresponding control policy. Our main contribution is the construction of abstractions, winning regions and control policies \emph{incrementally}, as data about the system dynamics accumulates. Concretely, our algorithm refines under- and over-approximations of reachable sets for each state-action pair as new data samples arrive. These refinements induce structural modifications in the game graph abstraction -- such as the addition or removal of nodes and edges -- which in turn modify the winning region. Crucially, we show that these updates are inherently monotonic: under-approximations only grow, over-approximations only shrink, and the winning region only expands. We exploit this monotonicity by defining an objective-induced ranking function on the nodes of the abstract game that increases monotonically as new data samples are incorporated. These ranks underpin our novel incremental game-solving algorithm, which employs customized gadgets (DAG-like subgames) within a rank-lifting algorithm to efficiently update the winning region. Numerical case studies demonstrate significant computational savings compared to the baseline approach, which re-solves the entire game from scratch whenever new data samples arrive.

SYApr 16
Temporal Logic Resilience for Continuous-time Systems

Ratnangshu Das, Negar Monir, Youssef Ait Si et al.

In this paper, we present a novel framework for quantifying a lower bound on resilience in continuous-time (non)linear systems subject to external disturbances while ensuring satisfaction of signal temporal logic specifications. Unlike robustness, which evaluates how well a system satisfies a specification under a given disturbance, resilience measures the maximum disturbance a system can tolerate from a given initial state while maintaining specification satisfaction. We first derive bounds on the perturbed trajectories and then use them to formulate a computational method based on scenario optimization to efficiently compute the maximum admissible disturbance. We validate our approach through case studies, including dc motor, temperature regulation, a nonlinear numerical example, and a vehicle collision avoidance case.

SYApr 12
Resilient and Effort-Optimal Controller Synthesis under Temporal Logic Specifications

Youssef Ait Si, Ratnangshu Das, Negar Monir et al.

In this paper, we consider the notions of effort and resilience of a dynamical control system defined by the maximum disturbance the system can withstand while satisfying given finite temporal logic specifications. Given a dynamical system and a specification, the objective is to synthesize the controller such that the system satisfies the specification while maximizing its resilience, taking into account input constraints. In addition, we introduce a new metric, called the effort metric, which characterizes the minimal input bound necessary to satisfy a given specification for a perturbed system. The problem for both metrics is formulated as a robust optimization program where the objective is to compute the maximum resilience for the system with input constraints or the minimal effort while simultaneously synthesizing the corresponding controller parameters. Moreover, we study the trade-off between resilience and effort, where we seek to maximize resilience and minimize the control effort. For linear systems and linear controllers, exact solutions are provided for the class of time-varying polytopic specifications for the closed-loop and open-loop systems. For the case of nonlinear systems, nonlinear controllers, and more general specifications, we leverage tools from the scenario optimization approach, offering a probabilistic guarantee of the solution as well as computational feasibility. Different case studies are presented to illustrate the theoretical results.

SYMay 21
Kernel-Based Safe Exploration in Deep Reinforcement Learning

Rupak Majumdar, Nikhil Singh, Sadegh Soudjani

Safety has been a major concern when deploying deep reinforcement learning algorithms in the real world. A promising direction that ensures that the learned policy does not visit unsafe regions is to learn a \emph{barrier function} along with the policy. A barrier is a function from states to reals that assigns low values to the initial states, high values to the unsafe states, and decreases in expectation on each transition; such a function can be used to bound the probability of reaching unsafe states. Previous attempts learned a barrier function directly from exploration data, but this required either large amounts of data or restrictions on the system dynamics. In this paper, we show how kernel embeddings can be used to learn barrier functions during deep reinforcement learning for stochastic systems with unknown dynamics. Our algorithm, \emph{kernel-based safe exploration (KBSE)}, learns an optimal policy and a barrier simultaneously during exploration. The barriers are computed iteratively, represented as conditional mean embeddings, and provide better probabilistic safety guarantees with more exploration. The exploration algorithm uses the learned barrier functions to identify safety violations. In the case of violation, it intervenes to modify the unsafe action to a safe action, thereby ensuring that the exploration is restricted to actions that bound the probability of reaching unsafe states. We evaluate KBSE on several complex continuous control benchmarks. Experimental results establish our new algorithm to be suitable for synthesizing control policies that are probabilistically safe without degradation in reward accumulation.

OCMay 10
Barrier Certificates for Uncertain Temporal Specifications

Mohammad H. Mamduhi, Sadegh Soudjani

This paper studies satisfying temporal logic specifications on stochastic dynamical systems, where the predicates evolve randomly over time. Such randomness may arise from uncertain environment models or external stochastic processes causing the sets associated with predicate satisfaction to vary in a non-deterministic manner. As a result, verifying whether a stochastic dynamical system satisfies a temporal specification depends also on the uncertainty in the predicates. We develop a certificate-based framework to bound the probability of satisfying temporal logic specifications with randomly evolving predicates. We first show that temporal logic specifications with stochastic predicates can be transformed to specifications with deterministic predicates on an augmented space which is extended to include the stochastic space of predicate's uncertainty. We then utilize barrier certificates on an augmented space to provide tractable optimization-based conditions and to avoid the computational burden of dynamic programming. Focusing on linear dynamics and safety-type specifications, we derive analytical conditions under which barrier certificates guarantee bounds on the probability of violating the stochastic safety predicates. The approach is demonstrated on numerical case studies.

SYDec 12, 2025
LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems

Ernesto Casablanca, Oliver Schön, Paolo Zuliani et al.

Ensuring the safety of AI-enabled systems, particularly in high-stakes domains such as autonomous driving and healthcare, has become increasingly critical. Traditional formal verification tools fall short when faced with systems that embed both opaque, black-box AI components and complex stochastic dynamics. To address these challenges, we introduce LUCID (Learning-enabled Uncertainty-aware Certification of stochastIc Dynamical systems), a verification engine for certifying safety of black-box stochastic dynamical systems from a finite dataset of random state transitions. As such, LUCID is the first known tool capable of establishing quantified safety guarantees for such systems. Thanks to its modular architecture and extensive documentation, LUCID is designed for easy extensibility. LUCID employs a data-driven methodology rooted in control barrier certificates, which are learned directly from system transition data, to ensure formal safety guarantees. We use conditional mean embeddings to embed data into a reproducing kernel Hilbert space (RKHS), where an RKHS ambiguity set is constructed that can be inflated to robustify the result to out-of-distribution behavior. A key innovation within LUCID is its use of a finite Fourier kernel expansion to reformulate a semi-infinite non-convex optimization problem into a tractable linear program. The resulting spectral barrier allows us to leverage the fast Fourier transform to generate the relaxed problem efficiently, offering a scalable yet distributionally robust framework for verifying safety. LUCID thus offers a robust and efficient verification framework, able to handle the complexities of modern black-box systems while providing formal guarantees of safety. These unique capabilities are demonstrated on challenging benchmarks.

LGNov 12, 2025
Quasi-Newton Compatible Actor-Critic for Deterministic Policies

Arash Bahari Kordabad, Dean Brandner, Sebastien Gros et al.

In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building on the concept of compatible function approximation for the critic, we introduce a quadratic critic that simultaneously preserves the true policy gradient and an approximation of the performance Hessian. A least-squares temporal difference learning scheme is then developed to estimate the quadratic critic parameters efficiently. This construction enables a quasi-Newton actor update using information learned by the critic, yielding faster convergence compared to first-order methods. The proposed approach is general and applicable to any differentiable policy class. Numerical examples demonstrate that the method achieves improved convergence and performance over standard deterministic actor-critic baselines.

AINov 12, 2025
BarrierBench : Evaluating Large Language Models for Safety Verification in Dynamical Systems

Ali Taheri, Alireza Taban, Sadegh Soudjani et al.

Safety verification of dynamical systems via barrier certificates is essential for ensuring correctness in autonomous applications. Synthesizing these certificates involves discovering mathematical functions with current methods suffering from poor scalability, dependence on carefully designed templates, and exhaustive or incremental function-space searches. They also demand substantial manual expertise--selecting templates, solvers, and hyperparameters, and designing sampling strategies--requiring both theoretical and practical knowledge traditionally shared through linguistic reasoning rather than formalized methods. This motivates a key question: can such expert reasoning be captured and operationalized by language models? We address this by introducing an LLM-based agentic framework for barrier certificate synthesis. The framework uses natural language reasoning to propose, refine, and validate candidate certificates, integrating LLM-driven template discovery with SMT-based verification, and supporting barrier-controller co-synthesis to ensure consistency between safety certificates and controllers. To evaluate this capability, we introduce BarrierBench, a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings. Our experiments assess not only the effectiveness of LLM-guided barrier synthesis but also the utility of retrieval-augmented generation and agentic coordination strategies in improving its reliability and performance. Across these tasks, the framework achieves more than 90% success in generating valid certificates. By releasing BarrierBench and the accompanying toolchain, we aim to establish a community testbed for advancing the integration of language-based reasoning with formal verification in dynamical systems. The benchmark is publicly available at https://hycodev.com/dataset/barrierbench

AIMay 7
Safety Certification is Classification

Oliver Schön, Licio Romao, Sadegh Soudjani

The goal of this paper is certifying safety of dynamical systems subject to uncertainty. Existing approaches use trajectory data to estimate transition probabilities, and compute safety probabilities recursively via dynamic programming (DP). This recursion may lead to compounding errors in the certified safety probability, thus collapsing to a vacuous lower bound for growing horizons $T$. We propose a kernel embedding framework that treats safety certification as a classification problem on trajectory data, directly estimating the $T$-step safety probability without recursion. We show that the framework subsumes well-established approaches from the literature (e.g., barrier certificates, robust Markov models) as special cases, and allows us to go beyond their limitations. As the main consequence, it bypasses compounding error across the horizon and enables certification for systems with non-Markovian dynamics. We demonstrate that direct estimators remain stable independent of the certification horizon and in the non-Markovian setting, whilst DP-based certificates silently go unsound -- confirmed in simulation on a neural-controlled quadrotor.

SYMay 5
Almost Sure Reachability in Continuous-time Stochastic Systems

Arash Bahari Kordabad, Rupak Majumdar, Sadegh Soudjani

We provide certificates for almost sure reachability of continuous-time stochastic systems governed by stochastic differential equations (SDEs). We first show that a standard Euler-Maruyama discretization may fail to preserve almost sure reachability property of the system using a double-well Langevin system. This observation motivates us to develop certificates for almost sure reachability directly on the continuous-time system. We introduce a pair of certificates, a drift function and a variant function, and prove necessity and sufficiency for almost sure reachability of an open bounded target set. Using these certificates, for linear SDEs, we give a characterization of almost sure reachability in terms of the spectral structure of the system matrices. For polynomial SDEs, we fix a polynomial template for the drift function and choose the variant function template as an exponential function composed with a polynomial. This allows us to translate the conditions in the certificates into sum-of-squares (SOS) constraints. We then propose an alternating scheme to resolve bilinearities. We illustrate the approach on the double-well Langevin example, showing that continuous-time SOS certificates recover almost sure reachability that is lost under time discretization. Moreover, we verify the SOS approach on a polynomial system.

SYMar 15, 2024
Data-Driven Distributionally Robust Safety Verification Using Barrier Certificates and Conditional Mean Embeddings

Oliver Schön, Zhengang Zhong, Sadegh Soudjani

Algorithmic verification of realistic systems to satisfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many approaches still rely on exact models of the underlying systems. Since this assumption can rarely be met in practice, models have to be inferred from measurement data or are bypassed completely. Whilst former usually requires the model structure to be known a-priori and immense amounts of data to be available, latter gives rise to a plethora of restrictive mathematical assumptions about the unknown dynamics. In a pursuit of developing scalable formal verification algorithms without shifting the problem to unrealistic assumptions, we employ the concept of barrier certificates, which can guarantee safety of the system, and learn the certificate directly from a compact set of system trajectories. We use conditional mean embeddings to embed data from the system into a reproducing kernel Hilbert space (RKHS) and construct an RKHS ambiguity set that can be inflated to robustify the result w.r.t. a set of plausible transition kernels. We show how to solve the resulting program efficiently using sum-of-squares optimization and a Gaussian process envelope. Our approach lifts the need for restrictive assumptions on the system dynamics and uncertainty, and suggests an improvement in the sample complexity of verifying the safety of a system on a tested case study compared to a state-of-the-art approach.

SYApr 29, 2024
Safe Reach Set Computation via Neural Barrier Certificates

Alessandro Abate, Sergiy Bogomolov, Alec Edwards et al.

We present a novel technique for online safety verification of autonomous systems, which performs reachability analysis efficiently for both bounded and unbounded horizons by employing neural barrier certificates. Our approach uses barrier certificates given by parameterized neural networks that depend on a given initial set, unsafe sets, and time horizon. Such networks are trained efficiently offline using system simulations sampled from regions of the state space. We then employ a meta-neural network to generalize the barrier certificates to state space regions that are outside the training set. These certificates are generated and validated online as sound over-approximations of the reachable states, thus either ensuring system safety or activating appropriate alternative actions in unsafe scenarios. We demonstrate our technique on case studies from linear models to nonlinear control-dependent models for online autonomous driving scenarios.

LGDec 15, 2023
Assume-Guarantee Reinforcement Learning

Milad Kazemi, Mateo Perez, Fabio Somenzi et al.

We present a modular approach to \emph{reinforcement learning} (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where the optimal control for the individual components is synthesized in isolation by making \emph{assumptions} about the behaviors of neighboring components, and providing \emph{guarantees} about their own behavior. We express these \emph{assume-guarantee contracts} as regular languages and provide automatic translations to scalar rewards to be used in RL. By combining local probabilities of satisfaction for each component, we provide a lower bound on the probability of satisfaction of the complete system. By solving a Markov game for each component, RL can produce a controller for each component that maximizes this lower bound. The controller utilizes the information it receives through communication, observations, and any knowledge of a coarse model of other agents. We experimentally demonstrate the efficiency of the proposed approach on a variety of case studies.

AIMay 21, 2025
Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

Milad Kazemi, Mateo Perez, Fabio Somenzi et al.

Recent advances in reinforcement learning (RL) have renewed focus on the design of reward functions that shape agent behavior. Manually designing reward functions is tedious and error-prone. A principled alternative is to specify behaviors in a formal language that can be automatically translated into rewards. Omega-regular languages are a natural choice for this purpose, given their established role in formal verification and synthesis. However, existing methods using omega-regular specifications typically rely on discounted reward RL in episodic settings, with periodic resets. This setup misaligns with the semantics of omega-regular specifications, which describe properties over infinite behavior traces. In such cases, the average reward criterion and the continuing setting -- where the agent interacts with the environment over a single, uninterrupted lifetime -- are more appropriate. To address the challenges of infinite-horizon, continuing tasks, we focus on absolute liveness specifications -- a subclass of omega-regular languages that cannot be violated by any finite behavior prefix, making them well-suited to the continuing setting. We present the first model-free RL framework that translates absolute liveness specifications to average-reward objectives. Our approach enables learning in communicating MDPs without episodic resetting. We also introduce a reward structure for lexicographic multi-objective optimization, aiming to maximize an external average-reward objective among the policies that also maximize the satisfaction probability of a given omega-regular specification. Our method guarantees convergence in unknown communicating MDPs and supports on-the-fly reductions that do not require full knowledge of the environment, thus enabling model-free RL. Empirical results show our average-reward approach in continuing setting outperforms discount-based methods across benchmarks.

AINov 18, 2024
Regret-Free Reinforcement Learning for LTL Specifications

Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani

Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDP). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.

SYJun 20, 2025
Formal Control for Uncertain Systems via Contract-Based Probabilistic Surrogates (Extended Version)

Oliver Schön, Sofie Haesaert, Sadegh Soudjani

The requirement for identifying accurate system representations has not only been a challenge to fulfill, but it has compromised the scalability of formal methods, as the resulting models are often too complex for effective decision making with formal correctness and performance guarantees. Focusing on probabilistic simulation relations and surrogate models of stochastic systems, we propose an approach that significantly enhances the scalability and practical applicability of such simulation relations by eliminating the need to compute error bounds directly. As a result, we provide an abstraction-based technique that scales effectively to higher dimensions while addressing complex nonlinear agent-environment interactions with infinite-horizon temporal logic guarantees amidst uncertainty. Our approach trades scalability for conservatism favorably, as demonstrated on a complex high-dimensional vehicle intersection case study.

CRMar 18, 2021
Stochastic Simulation Techniques for Inference and Sensitivity Analysis of Bayesian Attack Graphs

Isaac Matthews, Sadegh Soudjani, Aad van Moorsel

A vulnerability scan combined with information about a computer network can be used to create an attack graph, a model of how the elements of a network could be used in an attack to reach specific states or goals in the network. These graphs can be understood probabilistically by turning them into Bayesian attack graphs, making it possible to quantitatively analyse the security of large networks. In the event of an attack, probabilities on the graph change depending on the evidence discovered (e.g., by an intrusion detection system or knowledge of a host's activity). Since such scenarios are difficult to solve through direct computation, we discuss and compare three stochastic simulation techniques for updating the probabilities dynamically based on the evidence and compare their speed and accuracy. From our experiments we conclude that likelihood weighting is most efficient for most uses. We also consider sensitivity analysis of BAGs, to identify the most critical nodes for protection of the network and solve the uncertainty problem in the assignment of priors to nodes. Since sensitivity analysis can easily become computationally expensive, we present and demonstrate an efficient sensitivity analysis approach that exploits a quantitative relation with stochastic inference.

CRMay 13, 2020
Cyclic Bayesian Attack Graphs: A Systematic Computational Approach

Isaac Matthews, John Mace, Sadegh Soudjani et al.

Attack graphs are commonly used to analyse the security of medium-sized to large networks. Based on a scan of the network and likelihood information of vulnerabilities, attack graphs can be transformed into Bayesian Attack Graphs (BAGs). These BAGs are used to evaluate how security controls affect a network and how changes in topology affect security. A challenge with these automatically generated BAGs is that cycles arise naturally, which make it impossible to use Bayesian network theory to calculate state probabilities. In this paper we provide a systematic approach to analyse and perform computations over cyclic Bayesian attack graphs. %thus providing a generic approach to handle cycles as well as unifying the theory of Bayesian attack graphs. Our approach first formally introduces two commonly used versions of Bayesian attack graphs and compares their expressiveness. We then present an interpretation of Bayesian attack graphs based on combinational logic circuits, which facilitates an intuitively attractive systematic treatment of cycles. We prove properties of the associated logic circuit and present an algorithm that computes state probabilities without altering the attack graphs (e.g., remove an arc to remove a cycle). Moreover, our algorithm deals seamlessly with all cycles without the need to identify their types. A set of experiments using synthetically created networks demonstrates the scalability of the algorithm on computer networks with hundreds of machines, each with multiple vulnerabilities.

SYMay 8, 2020
Data-Driven Verification under Signal Temporal Logic Constraints

Ali Salamati, Sadegh Soudjani, Majid Zamani

We consider systems under uncertainty whose dynamics are partially unknown. Our aim is to study satisfaction of temporal logic properties by trajectories of such systems. We express these properties as signal temporal logic formulas and check if the probability of satisfying the property is at least a given threshold. Since the dynamics are parameterized and partially unknown, we collect data from the system and employ Bayesian inference techniques to associate a confidence value to the satisfaction of the property. The main novelty of our approach is to combine both data-driven and model-based techniques in order to have a two-layer probabilistic reasoning over the behavior of the system: one layer is related to the stochastic noise inside the system and the next layer is related to the noisy data collected from the system. We provide approximate algorithms for computing the confidence for linear dynamical systems.

SYMay 4, 2020
Formal Policy Synthesis for Continuous-Space Systems via Reinforcement Learning

Milad Kazemi, Sadegh Soudjani

This paper studies satisfaction of temporal properties on unknown stochastic processes that have continuous state spaces. We show how reinforcement learning (RL) can be applied for computing policies that are finite-memory and deterministic using only the paths of the stochastic process. We address properties expressed in linear temporal logic (LTL) and use their automaton representation to give a path-dependent reward function maximised via the RL algorithm. We develop the required assumptions and theories for the convergence of the learned policy to the optimal policy in the continuous state space. To improve the performance of the learning on the constructed sparse reward function, we propose a sequential learning procedure based on a sequence of labelling functions obtained from the positive normal form of the LTL specification. We use this procedure to guide the RL algorithm towards a policy that converges to an optimal policy under suitable assumptions on the process. We demonstrate the approach on a 4-dim cart-pole system and 6-dim boat driving problem.

SYMar 2, 2020
Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

Abolfazl Lavaei, Fabio Somenzi, Sadegh Soudjani et al.

A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed. This scheme enables one to apply model-free, off-the-shelf reinforcement learning algorithms for finite MDPs to compute optimal strategies for the corresponding continuous-space MDPs without explicitly constructing the finite-state abstraction. The proposed approach is based on abstracting the system with a finite MDP (without constructing it explicitly) with unknown transition probabilities, synthesizing strategies over the abstract MDP, and then mapping the results back over the concrete continuous-space MDP with approximate optimality guarantees. The properties of interest for the system belong to a fragment of linear temporal logic, known as syntactically co-safe linear temporal logic (scLTL), and the synthesis requirement is to maximize the probability of satisfaction within a given bounded time horizon. A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs and provide control strategies maximizing the probability of satisfaction over unknown, continuous-space MDPs while providing probabilistic closeness guarantees. Automata-based reward functions are often sparse; we present a novel potential-based reward shaping technique to produce dense rewards to speed up learning. The effectiveness of the proposed approach is demonstrated by applying it to three physical benchmarks concerning the regulation of a room's temperature, control of a road traffic cell, and of a 7-dimensional nonlinear model of a BMW 320i car.

SYMay 9, 2019
Compositional Construction of Infinite Abstractions for Networks of Stochastic Control Systems

Abolfazl Lavaei, Sadegh Soudjani, Majid Zamani

This paper is concerned with a compositional approach for constructing infinite abstractions of interconnected discrete-time stochastic control systems. The proposed approach uses the interconnection matrix and joint dissipativity-type properties of subsystems and their abstractions described by a new notion of so-called stochastic storage functions. The interconnected abstraction framework is based on new notions of so-called stochastic simulation functions, using which one can quantify the distance between original interconnected stochastic control systems and interconnected abstractions in the probabilistic setting. In the first part of the paper, we derive dissipativity-type compositional reasoning for the quantification of the distance in probability between the interconnection of stochastic control subsystems and that of their abstractions. Moreover, we focus on a class of discrete-time nonlinear stochastic control systems with independent noises in the abstract and concrete subsystems, and propose a computational scheme to construct abstractions together with their corresponding stochastic storage functions. In the second part of the paper, we consider specifications expressed as syntactically co-safe linear temporal logic formulae and show how a synthesized policy for the abstract system can be refined to a policy for the original system while providing guarantee on the probability of satisfaction. We demonstrate the effectiveness of the proposed results by constructing an abstraction (totally 3 dimensions) of the interconnection of three discrete-time nonlinear stochastic control subsystems (together 222 dimensions) in a compositional fashion. We also employ the abstraction as a substitute to synthesize a controller enforcing a syntactically co-safe linear temporal logic specification.

LGJan 21, 2019
Perception-in-the-Loop Adversarial Examples

Mahmoud Salamati, Sadegh Soudjani, Rupak Majumdar

We present a scalable, black box, perception-in-the-loop technique to find adversarial examples for deep neural network classifiers. Black box means that our procedure only has input-output access to the classifier, and not to the internal structure, parameters, or intermediate confidence values. Perception-in-the-loop means that the notion of proximity between inputs can be directly queried from human participants rather than an arbitrarily chosen metric. Our technique is based on covariance matrix adaptation evolution strategy (CMA-ES), a black box optimization approach. CMA-ES explores the search space iteratively in a black box manner, by generating populations of candidates according to a distribution, choosing the best candidates according to a cost function, and updating the posterior distribution to favor the best candidates. We run CMA-ES using human participants to provide the fitness function, using the insight that the choice of best candidates in CMA-ES can be naturally modeled as a perception task: pick the top $k$ inputs perceptually closest to a fixed input. We empirically demonstrate that finding adversarial examples is feasible using small populations and few iterations. We compare the performance of CMA-ES on the MNIST benchmark with other black-box approaches using $L_p$ norms as a cost function, and show that it performs favorably both in terms of success in finding adversarial examples and in minimizing the distance between the original and the adversarial input. In experiments on the MNIST, CIFAR10, and GTSRB benchmarks, we demonstrate that CMA-ES can find perceptually similar adversarial inputs with a small number of iterations and small population sizes when using perception-in-the-loop. Finally, we show that networks trained specifically to be robust against $L_\infty$ norm can still be susceptible to perceptually similar adversarial examples.

SYAug 9, 2017
Compositional Abstraction-Based Controller Synthesis for Continuous-Time Systems

Kaushik Mallik, Anne-Kathrin Schmuck, Sadegh Soudjani et al.

Controller synthesis techniques for continuous systems with respect to temporal logic specifications typically use a finite-state symbolic abstraction of the system model. Constructing this abstraction for the entire system is computationally expensive, and does not exploit natural decompositions of many systems into interacting components. We describe a methodology for compositional symbolic abstraction to help scale controller synthesis for temporal logic to larger systems. We introduce a new relation, called (approximate) disturbance bisimulation, as the basis for compositional symbolic abstractions. Disturbance bisimulation strengthens the standard approximate alternating bisimulation relation used in control. It extends naturally to systems which are composed of weakly interconnected sub-components possibly connected in feedback, and models the coupling signals as disturbances. After proving this composability of disturbance bisimulation for metric systems we apply this result to the compositional abstraction of networks of input-to-state stable deterministic non-linear control systems. We give conditions that allow to construct finite-state abstractions compositionally for each component in such a network, so that the abstractions are simultaneously disturbance bisimilar to their continuous counterparts. Combining these two results, we show conditions under which one can compositionally abstract a network of non-linear control systems in a modular way while ensuring that the final composed abstraction is disturbance bisimilar to the original system. We discuss how we get a compositional abstraction-based controller synthesis methodology for networks of such systems against local temporal specifications as a by-product of our construction.