30.2SYJun 3
Training with Hard Constraints: Learning Neural Certificates and Controllers for SDEsChun-Wei Kong, Sebastian Escobar, Ibon Gracia et al.
Due to their expressive power, neural networks (NNs) are promising templates for functional optimization problems, particularly for reach-avoid certificate generation for systems governed by stochastic differential equations (SDEs). However, ensuring hard-constraint satisfaction remains a major challenge. In this work, we propose two constraint-driven training frameworks with guarantees for supermartingale-based neural certificate construction and controller synthesis for SDEs. The first approach enforces certificate inequalities via domain discretization and a bound-based loss, guaranteeing global validity once the loss reaches zero. We show that this method also enables joint NN controller-certificate synthesis with hard guarantees. For high-dimensional systems where discretization becomes prohibitive, we introduce a partition-free, scenario-based training method that provides arbitrarily tight PAC guarantees for certificate constraint satisfaction. Benchmarks demonstrate scalability of the bound-based method up to 5D, outperforming the state of the art, and scalability of the scenario-based approach to at least 10D with high-confidence guarantees.
SYNov 2, 2022
Interval Markov Decision Processes with Continuous Action-SpacesGiannis Delimpaltadakis, Morteza Lahijanian, Manuel Mazo et al.
Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to $|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set $\mathcal{A}$ is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of $\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.
SYMar 24, 2011
Probabilistically Safe Vehicle Control in a Hostile EnvironmentIgor Cizelj, Xu Chu Ding, Morteza Lahijanian et al.
In this paper we present an approach to control a vehicle in a hostile environment with static obstacles and moving adversaries. The vehicle is required to satisfy a mission objective expressed as a temporal logic specification over a set of properties satisfied at regions of a partitioned environment. We model the movements of adversaries in between regions of the environment as Poisson processes. Furthermore, we assume that the time it takes for the vehicle to traverse in between two facets of each region is exponentially distributed, and we obtain the rate of this exponential distribution from a simulator of the environment. We capture the motion of the vehicle and the vehicle updates of adversaries distributions as a Markov Decision Process. Using tools in Probabilistic Computational Tree Logic, we find a control strategy for the vehicle that maximizes the probability of accomplishing the mission objective. We demonstrate our approach with illustrative case studies.
LGJun 19, 2023
BNN-DP: Robustness Certification of Bayesian Neural Networks via Dynamic ProgrammingSteven Adams, Andrea Patane, Morteza Lahijanian et al.
In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs). Given a compact set of input points $T\subset \mathbb{R}^n$, BNN-DP computes lower and upper bounds on the BNN's predictions for all the points in $T$. The framework is based on an interpretation of BNNs as stochastic dynamical systems, which enables the use of Dynamic Programming (DP) algorithms to bound the prediction range along the layers of the network. Specifically, the method uses bound propagation techniques and convex relaxations to derive a backward recursion procedure to over-approximate the prediction range of the BNN with piecewise affine functions. The algorithm is general and can handle both regression and classification tasks. On a set of experiments on various regression and classification tasks and BNN architectures, we show that BNN-DP outperforms state-of-the-art methods by up to four orders of magnitude in both tightness of the bounds and computational efficiency.
ROAug 30, 2011
Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion ControlReza Moazzez Estanjini, Xu Chu Ding, Morteza Lahijanian et al.
We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.
21.4ROMay 26
Provably Safe Motion Planning Under Unknown DisturbancesIbon Gracia, Qi Heng Ho, Luca Laurenti et al.
We present a provably safe sampling-based motion planning algorithm for robotic systems affected by random disturbances of unknown distribution. We consider systems with linear or linearizable dynamics evolving in workspace with arbitrary-shaped obstacles subject to state and control constraints. Safety requirements are formulated as chance-constraints. Our approach leverages data from trajectories of the system to learn a Wasserstein ambiguity tube, i.e., a sequence of ambiguity sets, which contains the trajectory of the system's state distribution with high confidence. This ambiguity tube is then used in a probabilistically complete algorithm to grow a sampling-based motion planning tree that respects the constraints of the problem. We show that learning several lower-dimensional ambiguity tubes instead of a single high-dimensional one effectively reduces the conservatism and boosts scalability. Additionally, we design an efficient bandit-based validity checker that remarkably increases the empirical performance of our approach without sacrificing probabilistic completeness. Case studies show our algorithm finds valid plans in cluttered environments under strict safety thresholds, outperforming state-of-the-art methods.
LGAug 16, 2024
Error Bounds For Gaussian Process Regression Under Bounded Support Noise With Applications To Safety CertificationRobert Reed, Luca Laurenti, Morteza Lahijanian
Gaussian Process Regression (GPR) is a powerful and elegant method for learning complex functions from noisy data with a wide range of applications, including in safety-critical domains. Such applications have two key features: (i) they require rigorous error quantification, and (ii) the noise is often bounded and non-Gaussian due to, e.g., physical constraints. While error bounds for applying GPR in the presence of non-Gaussian noise exist, they tend to be overly restrictive and conservative in practice. In this paper, we provide novel error bounds for GPR under bounded support noise. Specifically, by relying on concentration inequalities and assuming that the latent function has low complexity in the reproducing kernel Hilbert space (RKHS) corresponding to the GP kernel, we derive both probabilistic and deterministic bounds on the error of the GPR. We show that these errors are substantially tighter than existing state-of-the-art bounds and are particularly well-suited for GPR with neural network kernels, i.e., Deep Kernel Learning (DKL). Furthermore, motivated by applications in safety-critical domains, we illustrate how these bounds can be combined with stochastic barrier functions to successfully quantify the safety probability of an unknown dynamical system from finite data. We validate the efficacy of our approach through several benchmarks and comparisons against existing bounds. The results show that our bounds are consistently smaller, and that DKLs can produce error bounds tighter than sample noise, significantly improving the safety probability of control systems.
LGJul 26, 2024
Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior SelectionSteven Adams, Andrea Patanè, Morteza Lahijanian et al.
Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $ε>0$ our approach is able to return a mixture of Gaussian processes that is $ε$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.
SYSep 12, 2023
Promises of Deep Kernel Learning for Control SynthesisRobert Reed, Luca Laurenti, Morteza Lahijanian
Deep Kernel Learning (DKL) combines the representational power of neural networks with the uncertainty quantification of Gaussian Processes. Hence, it is potentially a promising tool to learn and control complex dynamical systems. In this work, we develop a scalable abstraction-based framework that enables the use of DKL for control synthesis of stochastic dynamical systems against complex specifications. Specifically, we consider temporal logic specifications and create an end-to-end framework that uses DKL to learn an unknown system from data and formally abstracts the DKL model into an Interval Markov Decision Process (IMDP) to perform control synthesis with correctness guarantees. Furthermore, we identify a deep architecture that enables accurate learning and efficient abstraction computation. The effectiveness of our approach is illustrated on various benchmarks, including a 5-D nonlinear stochastic system, showing how control synthesis with DKL can substantially outperform state-of-the-art competitive methods.
ROJun 22, 2023
Optimal Cost-Preference Trade-off Planning with Multiple Temporal TasksPeter Amorese, Morteza Lahijanian
Autonomous robots are increasingly utilized in realistic scenarios with multiple complex tasks. In these scenarios, there may be a preferred way of completing all of the given tasks, but it is often in conflict with optimal execution. Recent work studies preference-based planning, however, they have yet to extend the notion of preference to the behavior of the robot with respect to each task. In this work, we introduce a novel notion of preference that provides a generalized framework to express preferences over individual tasks as well as their relations. Then, we perform an optimal trade-off (Pareto) analysis between behaviors that adhere to the user's preference and the ones that are resource optimal. We introduce an efficient planning framework that generates Pareto-optimal plans given user's preference by extending A* search. Further, we show a method of computing the entire Pareto front (the set of all optimal trade-offs) via an adaptation of a multi-objective A* algorithm. We also present a problem-agnostic search heuristic to enable scalability. We illustrate the power of the framework on both mobile robots and manipulators. Our benchmarks show the effectiveness of the heuristic with up to 2-orders of magnitude speedup.
AIOct 15, 2023
Recursively-Constrained Partially Observable Markov Decision ProcessesQi Heng Ho, Tyler Becker, Benjamin Kraske et al.
Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.
SYApr 14, 2023
Sampling-based Reactive Synthesis for Nondeterministic Hybrid SystemsQi Heng Ho, Zachary N. Sunberg, Morteza Lahijanian
This paper introduces a sampling-based strategy synthesis algorithm for nondeterministic hybrid systems with complex continuous dynamics under temporal and reachability constraints. We model the evolution of the hybrid system as a two-player game, where the nondeterminism is an adversarial player whose objective is to prevent achieving temporal and reachability goals. The aim is to synthesize a winning strategy -- a reactive (robust) strategy that guarantees the satisfaction of the goals under all possible moves of the adversarial player. Our proposed approach involves growing a (search) game-tree in the hybrid space by combining sampling-based motion planning with a novel bandit-based technique to select and improve on partial strategies. We show that the algorithm is probabilistically complete, i.e., the algorithm will asymptotically almost surely find a winning strategy, if one exists. The case studies and benchmark results show that our algorithm is general and effective, and consistently outperforms state of the art algorithms.
14.5AIApr 23
Robustness Analysis of POMDP Policies to Observation PerturbationsBenjamin Kraske, Qi Heng Ho, Federico Rossi et al.
Policies for Partially Observable Markov Decision Processes (POMDPs) are often designed using a nominal system model. In practice, this model can deviate from the true system during deployment due to factors such as calibration drift or sensor degradation, leading to unexpected performance degradation. This work studies policy robustness against deviations in the POMDP observation model. We introduce the Policy Observation Robustness Problem: to determine the maximum tolerable deviation in a POMDP's observation model that guarantees the policy's value remains above a specified threshold. We analyze two variants: the sticky variant, where deviations are dependent on state and actions, and the non-sticky variant, where they can be history-dependent. We show that the Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, we show that when policies are represented with finite-state controllers (FSCs) it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. We present Robust Interval Search, an algorithm with soundness and convergence guarantees, for both the sticky and non-sticky variants. We show this algorithm has polynomial time complexity in the non-sticky variant and at most exponential time complexity in the sticky variant. We provide experimental results validating and demonstrating the scalability of implementations of Robust Interval Search to POMDP problems with tens of thousands of states. We also provide case studies from robotics and operations research which demonstrate the practical utility of the problem and algorithms.
48.8ROApr 22
Stochastic Barrier Certificates in the Presence of Dynamic ObstaclesRayan Mazouz, Luca Laurenti, Morteza Lahijanian
Safety of stochastic dynamic systems in environments with dynamic obstacles is studied in this paper through the lens of stochastic barrier functions. We introduce both time-invariant and time-varying barrier certificates for discrete-time, continuous-space systems subject to uncertainty, which provide certified lower bounds on the probability of remaining within a safe set over a finite horizon. These certificates explicitly account for time-varying unsafe regions induced by obstacle dynamics. By leveraging Bellman's optimality perspective, the time-varying formulation directly captures temporal structure and yields less conservative bounds than state-of-the-art approaches. By restricting certificates to polynomial functions, we show that time-varying barrier synthesis can be formulated as a convex sum-of-squares program, enabling tractable optimization. Empirical evaluations on nonlinear systems with dynamic obstacles show that time-varying certificates consistently achieve tight guarantees, demonstrating improved accuracy and scalability over state-of-the-art methods.
29.3ROApr 1
Sampling-based Task and Kinodynamic Motion Planning under Semantic UncertaintyQi Heng Ho, Zachary N. Sunberg, Morteza Lahijanian
This paper tackles the problem of integrated task and kinodynamic motion planning in uncertain environments. We consider a robot with nonlinear dynamics tasked with a Linear Temporal Logic over finite traces ($\ltlf$) specification operating in a partially observable environment. Specifically, the uncertainty is in the semantic labels of the environment. We show how the problem can be modeled as a Partially Observable Stochastic Hybrid System that captures the robot dynamics, $\ltlf$ task, and uncertainty in the environment state variables. We propose an anytime algorithm that takes advantage of the structure of the hybrid system, and combines the effectiveness of decision-making techniques and sampling-based motion planning. We prove the soundness and asymptotic optimality of the algorithm. Results show the efficacy of our algorithm in uncertain environments, and that it consistently outperforms baseline methods.
29.0SYApr 15
On the Optimality of Uncertain MDP AbstractionsIbon Gracia, Morteza Lahijanian
We study the asymptotic optimality of abstraction-based control synthesis algorithms. Specifically, we consider uncertain MDP (UMDP) abstraction, and investigate whether refinement leads to optimal results, i.e., an optimal controller and zero error bound. Additionally, we study completeness of abstraction-refinement algorithms, i.e., that the algorithm produces near-optimal results in finite time. The focus is on nonlinear stochastic systems with general vector fields and temporal logic specifications. We present an algorithm that abstracts the system into a UMDP and synthesizes a controller with performance guarantees via robust dynamic programming. Then, the algorithm iteratively refines the abstraction until a near-optimality criterion is met. A thorough theoretical analysis reveals a sufficient condition, which we denote vanishing ambiguity, guaranteeing asymptotic optimality of the abstraction process and completeness of the algorithm. We show that set-valued MDP abstractions satisfy this criterion, whereas interval MDP abstractions lack such a guarantee.
66.1SYMar 26
Time-Varying Reach-Avoid Control Certificates for Stochastic SystemsRayan Mazouz, Luca Laurenti, Morteza Lahijanian
Reach-avoid analysis is fundamental to reasoning about the safety and goal-reaching behavior of dynamical systems, and serves as a foundation for specifying and verifying more complex control objectives. This paper introduces a reach-avoid certificate framework for discrete-time, continuous-space stochastic systems over both finite- and infinite-horizon settings. We propose two formulations: time-varying and time-invariant certificates. We also show how these certificates can be synthesized using sum-of-squares (SOS) optimization, providing a convex formulation for verifying a given controller. Furthermore, we present an SOS-based method for the joint synthesis of an optimal feedback controller and its corresponding reach-avoid certificate, enabling the maximization of the probability of reaching the target set while avoiding unsafe regions. Case studies and benchmark results demonstrate the efficacy of the proposed framework in certifying and controlling stochastic systems with continuous state and action spaces.
33.3LGApr 8
Learning Markov Processes as Sum-of-Square Forms for Analytical Belief PropagationPeter Amorese, Morteza Lahijanian
Harnessing the predictive capability of Markov process models requires propagating probability density functions (beliefs) through the model. For many existing models however, belief propagation is analytically infeasible, requiring approximation or sampling to generate predictions. This paper proposes a functional modeling framework leveraging sparse Sum-of-Squares (SoS) forms for valid (conditional) density estimation. We study the theoretical restrictions of modeling conditional densities using the SoS form, and propose a novel functional form for addressing such limitations. The proposed architecture enables generalized simultaneous learning of basis functions and coefficients, while preserving analytical belief propagation. In addition, we propose a training method that allows for exact adherence to the normalization and non-negativity constraints. Our results show that the proposed method achieves accuracy comparable to state-of-the-art approaches while requiring significantly less memory in low-dimensional spaces, and it further scales to 12D systems when existing methods fail beyond 2D.
LGMar 8, 2024
Shielded Deep Reinforcement Learning for Complex Spacecraft TaskingRobert Reed, Hanspeter Schaub, Morteza Lahijanian
Autonomous spacecraft control via Shielded Deep Reinforcement Learning (SDRL) has become a rapidly growing research area. However, the construction of shields and the definition of tasking remains informal, resulting in policies with no guarantees on safety and ambiguous goals for the RL agent. In this paper, we first explore the use of formal languages, namely Linear Temporal Logic (LTL), to formalize spacecraft tasks and safety requirements. We then define a manner in which to construct a reward function from a co-safe LTL specification automatically for effective training in SDRL framework. We also investigate methods for constructing a shield from a safe LTL specification for spacecraft applications and propose three designs that provide probabilistic guarantees. We show how these shields interact with different policies and the flexibility of the reward structure through several experiments.
LGApr 30, 2024
Data-Driven Permissible Safe Control with Barrier CertificatesRayan Mazouz, John Skovbekk, Frederik Baymler Mathiesen et al.
This paper introduces a method of identifying a maximal set of safe strategies from data for stochastic systems with unknown dynamics using barrier certificates. The first step is learning the dynamics of the system via Gaussian process (GP) regression and obtaining probabilistic errors for this estimate. Then, we develop an algorithm for constructing piecewise stochastic barrier functions to find a maximal permissible strategy set using the learned GP model, which is based on sequentially pruning the worst controls until a maximal set is identified. The permissible strategies are guaranteed to maintain probabilistic safety for the true system. This is especially important for learning-enabled systems, because a rich strategy space enables additional data collection and complex behaviors while remaining safe. Case studies on linear and nonlinear systems demonstrate that increasing the size of the dataset for learning the system grows the permissible strategy set.
LGDec 15, 2025
Scalable Formal Verification via Autoencoder Latent Space AbstractionRobert Reed, Luca Laurenti, Morteza Lahijanian
Finite Abstraction methods provide a powerful formal framework for proving that systems satisfy their specifications. However, these techniques face scalability challenges for high-dimensional systems, as they rely on state-space discretization which grows exponentially with dimension. Learning-based approaches to dimensionality reduction, utilizing neural networks and autoencoders, have shown great potential to alleviate this problem. However, ensuring the correctness of the resulting verification results remains an open question. In this work, we provide a formal approach to reduce the dimensionality of systems via convex autoencoders and learn the dynamics in the latent space through a kernel-based method. We then construct a finite abstraction from the learned model in the latent space and guarantee that the abstraction contains the true behaviors of the original system. We show that the verification results in the latent space can be mapped back to the original system. Finally, we demonstrate the effectiveness of our approach on multiple systems, including a 26D system controlled by a neural network, showing significant scalability improvements without loss of rigor.
LGSep 19, 2025
Universal Learning of Stochastic Dynamics for Exact Belief Propagation using Bernstein Normalizing FlowsPeter Amorese, Morteza Lahijanian
Predicting the distribution of future states in a stochastic system, known as belief propagation, is fundamental to reasoning under uncertainty. However, nonlinear dynamics often make analytical belief propagation intractable, requiring approximate methods. When the system model is unknown and must be learned from data, a key question arises: can we learn a model that (i) universally approximates general nonlinear stochastic dynamics, and (ii) supports analytical belief propagation? This paper establishes the theoretical foundations for a class of models that satisfy both properties. The proposed approach combines the expressiveness of normalizing flows for density estimation with the analytical tractability of Bernstein polynomials. Empirical results show the efficacy of our learned model over state-of-the-art data-driven methods for belief propagation, especially for highly non-linear systems with non-additive, non-Gaussian noise.
LGOct 28, 2024
Error Bounds for Physics-Informed Neural Networks in Fokker-Planck PDEsChun-Wei Kong, Luca Laurenti, Jay McMahon et al.
Stochastic differential equations are commonly used to describe the evolution of stochastic processes. The state uncertainty of such processes is best represented by the probability density function (PDF), whose evolution is governed by the Fokker-Planck partial differential equation (FP-PDE). However, it is generally infeasible to solve the FP-PDE in closed form. In this work, we show that physics-informed neural networks (PINNs) can be trained to approximate the solution PDF. Our main contribution is the analysis of PINN approximation error: we develop a theoretical framework to construct tight error bounds using PINNs. In addition, we derive a practical error bound that can be efficiently constructed with standard training methods. We discuss that this error-bound framework generalizes to approximate solutions of other linear PDEs. Empirical results on nonlinear, high-dimensional, and chaotic systems validate the correctness of our error bounds while demonstrating the scalability of PINNs and their significant computational speedup in obtaining accurate PDF solutions compared to the Monte Carlo approach.
ROJun 17, 2024
Online Pareto-Optimal Decision-Making for Complex Tasks using Active InferencePeter Amorese, Shohei Wakayama, Nisar Ahmed et al.
When a robot autonomously performs a complex task, it frequently must balance competing objectives while maintaining safety. This becomes more difficult in uncertain environments with stochastic outcomes. Enhancing transparency in the robot's behavior and aligning with user preferences are also crucial. This paper introduces a novel framework for multi-objective reinforcement learning that ensures safe task execution, optimizes trade-offs between objectives, and adheres to user preferences. The framework has two main layers: a multi-objective task planner and a high-level selector. The planning layer generates a set of optimal trade-off plans that guarantee satisfaction of a temporal logic task. The selector uses active inference to decide which generated plan best complies with user preferences and aids learning. Operating iteratively, the framework updates a parameterized learning model based on collected data. Case studies and benchmarks on both manipulation and mobile robots show that our framework outperforms other methods and (i) learns multiple optimal trade-offs, (ii) adheres to a user preference, and (iii) allows the user to adjust the balance between (i) and (ii).
AIJun 5, 2024
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability ObjectivesQi Heng Ho, Martin S. Feather, Federico Rossi et al.
Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.
ROFeb 24, 2022
Gaussian Belief Trees for Chance Constrained Asymptotically Optimal Motion PlanningQi Heng Ho, Zachary N. Sunberg, Morteza Lahijanian
In this paper, we address the problem of sampling-based motion planning under motion and measurement uncertainty with probabilistic guarantees. We generalize traditional sampling-based tree-based motion planning algorithms for deterministic systems and propose belief-$\mathcal{A}$, a framework that extends any kinodynamical tree-based planner to the belief space for linear (or linearizable) systems. We introduce appropriate sampling techniques and distance metrics for the belief space that preserve the probabilistic completeness and asymptotic optimality properties of the underlying planner. We demonstrate the efficacy of our approach for finding safe low-cost paths efficiently and asymptotically optimally in simulation, for both holonomic and non-holonomic systems.
AIFeb 20, 2022
Conflict-Based Search for Explainable Multi-Agent Path FindingJustin Kottinger, Shaull Almagor, Morteza Lahijanian
In the Multi-Agent Path Finding (MAPF) problem, the goal is to find non-colliding paths for agents in an environment, such that each agent reaches its goal from its initial location. In safety-critical applications, a human supervisor may want to verify that the plan is indeed collision-free. To this end, a recent work introduces a notion of explainability for MAPF based on a visualization of the plan as a short sequence of images representing time segments, where in each time segment the trajectories of the agents are disjoint. Then, the explainable MAPF problem asks for a set of non-colliding paths that admits a short-enough explanation. Explainable MAPF adds a new difficulty to MAPF, in that it is NP-hard with respect to the size of the environment, and not just the number of agents. Thus, traditional MAPF algorithms are not equipped to directly handle explainable-MAPF. In this work, we adapt Conflict Based Search (CBS), a well-studied algorithm for MAPF, to handle explainable MAPF. We show how to add explainability constraints on top of the standard CBS tree and its underlying A* search. We examine the usefulness of this approach and, in particular, the tradeoff between planning time and explainability.
SYDec 31, 2021
Formal Verification of Unknown Dynamical Systems via Gaussian Process RegressionJohn Skovbekk, Luca Laurenti, Eric Frew et al.
Leveraging autonomous systems in safety-critical scenarios requires verifying their behaviors in the presence of uncertainties and black-box components that influence the system dynamics. In this work, we develop a framework for verifying discrete-time dynamical systems with unmodelled dynamics and noisy measurements against temporal logic specifications from an input-output dataset. The verification framework employs Gaussian process (GP) regression to learn the unknown dynamics from the dataset and abstracts the continuous-space system as a finite-state, uncertain Markov decision process (MDP). This abstraction relies on space discretization and transition probability intervals that capture the uncertainty due to the error in GP regression by using reproducible kernel Hilbert space analysis as well as the uncertainty induced by discretization. The framework utilizes existing model checking tools for verification of the uncertain MDP abstraction against a given temporal logic specification. We establish the correctness of extending the verification results on the abstraction created from noisy measurements to the underlying system. We show that the computational complexity of the framework is polynomial in the size of the dataset and discrete abstraction. The complexity analysis illustrates a trade-off between the quality of the verification results and the computational burden to handle larger datasets and finer abstractions. Finally, we demonstrate the efficacy of our learning and verification framework on several case studies with linear, nonlinear, and switched dynamical systems.
ROOct 30, 2020
MAPS-X: Explainable Multi-Robot Motion Planning via SegmentationJustin Kottinger, Shaull Almagor, Morteza Lahijanian
Traditional multi-robot motion planning (MMP) focuses on computing trajectories for multiple robots acting in an environment, such that the robots do not collide when the trajectories are taken simultaneously. In safety-critical applications, a human supervisor may want to verify that the plan is indeed collision-free. In this work, we propose a notion of explanation for a plan of MMP, based on visualization of the plan as a short sequence of images representing time segments, where in each time segment the trajectories of the agents are disjoint, clearly illustrating the safety of the plan. We show that standard notions of optimality (e.g., makespan) may create conflict with short explanations. Thus, we propose meta-algorithms, namely multi-agent plan segmenting-X (MAPS-X) and its lazy variant, that can be plugged on existing centralized sampling-based tree planners X to produce plans with good explanations using a desirable number of images. We demonstrate the efficacy of this explanation-planning scheme and extensively evaluate the performance of MAPS-X.
LOSep 23, 2020
LTLf Synthesis on Probabilistic SystemsAndrew M. Wells, Morteza Lahijanian, Lydia E. Kavraki et al.
Many systems are naturally modeled as Markov Decision Processes (MDPs), combining probabilities and strategic actions. Given a model of a system as an MDP and some logical specification of system behavior, the goal of synthesis is to find a policy that maximizes the probability of achieving this behavior. A popular choice for defining behaviors is Linear Temporal Logic (LTL). Policy synthesis on MDPs for properties specified in LTL has been well studied. LTL, however, is defined over infinite traces, while many properties of interest are inherently finite. Linear Temporal Logic over finite traces (LTLf) has been used to express such properties, but no tools exist to solve policy synthesis for MDP behaviors given finite-trace properties. We present two algorithms for solving this synthesis problem: the first via reduction of LTLf to LTL and the second using native tools for LTLf. We compare the scalability of these two approaches for synthesis and show that the native approach offers better scalability compared to existing automaton generation tools for LTL.
ROApr 26, 2020
Online Mapping and Motion Planning under Uncertainty for Safe Navigation in Unknown EnvironmentsÈric Pairet, Juan David Hernández, Marc Carreras et al.
Safe autonomous navigation is an essential and challenging problem for robots operating in highly unstructured or completely unknown environments. Under these conditions, not only robotic systems must deal with limited localisation information, but also their manoeuvrability is constrained by their dynamics and often suffer from uncertainty. In order to cope with these constraints, this manuscript proposes an uncertainty-based framework for mapping and planning feasible motions online with probabilistic safety-guarantees. The proposed approach deals with the motion, probabilistic safety, and online computation constraints by: (i) incrementally mapping the surroundings to build an uncertainty-aware representation of the environment, and (ii) iteratively (re)planning trajectories to goal that are kinodynamically feasible and probabilistically safe through a multi-layered sampling-based planner in the belief space. In-depth empirical analyses illustrate some important properties of this approach, namely, (a) the multi-layered planning strategy enables rapid exploration of the high-dimensional belief space while preserving asymptotic optimality and completeness guarantees, and (b) the proposed routine for probabilistic collision checking results in tighter probability bounds in comparison to other uncertainty-aware planners in the literature. Furthermore, real-world in-water experimental evaluation on a non-holonomic torpedo-shaped autonomous underwater vehicle and simulated trials in the Stairwell scenario of the DARPA Subterranean Challenge 2019 on a quadrotor unmanned aerial vehicle demonstrate the efficacy of the method as well as its suitability for systems with limited on-board computational power.
ROJul 22, 2019
Correct-by-Construction Advanced Driver Assistance Systems based on a Cognitive ArchitectureFrancisco Eiras, Morteza Lahijanian, Marta Kwiatkowska
Research into safety in autonomous and semi-autonomous vehicles has, so far, largely been focused on testing and validation through simulation. Due to the fact that failure of these autonomous systems is potentially life-endangering, formal methods arise as a complementary approach. This paper studies the application of formal methods to the verification of a human driver model built using the cognitive architecture ACT-R, and to the design of correct-by-construction Advanced Driver Assistance Systems (ADAS). The novelty lies in the integration of ACT-R in the formal analysis and an abstraction technique that enables finite representation of a large dimensional, continuous system in the form of a Markov process. The situation considered is a multi-lane highway driving scenario and the interactions that arise. The efficacy of the method is illustrated in two case studies with various driving conditions.
SYJul 6, 2017
Multi-objective Robust Strategy Synthesis for Interval Markov Decision ProcessesErnst Moritz Hahn, Vahid Hashemi, Holger Hermanns et al.
Interval Markov decision processes (IMDPs) generalise classical MDPs by having interval-valued transition probabilities. They provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty that prevents the knowledge of the exact transition probabilities. In this paper, we consider the problem of multi-objective robust strategy synthesis for interval MDPs, where the aim is to find a robust strategy that guarantees the satisfaction of multiple properties at the same time in face of the transition probability uncertainty. We first show that this problem is PSPACE-hard. Then, we provide a value iteration-based decision algorithm to approximate the Pareto set of achievable points. We finally demonstrate the practical effectiveness of our proposed approaches by applying them on several case studies using a prototypical tool.
ROSep 16, 2016
Resource-Performance Trade-off Analysis for Mobile Robot DesignMorteza Lahijanian, Maria Svorenova, Akshay A. Morye et al.
The design of mobile autonomous robots is challenging due to the limited on-board resources such as processing power and energy. A promising approach is to generate intelligent schedules that reduce the resource consumption while maintaining best performance, or more interestingly, to trade off reduced resource consumption for a slightly lower but still acceptable level of performance. In this paper, we provide a framework to aid designers in exploring such resource-performance trade-offs and finding schedules for mobile robots, guided by questions such as "what is the minimum resource budget required to achieve a given level of performance?" The framework is based on a quantitative multi-objective verification technique which, for a collection of possibly conflicting objectives, produces the Pareto front that contains all the optimal trade-offs that are achievable. The designer then selects a specific Pareto point based on the resource constraints and desired performance level, and a correct-by-construction schedule that meets those constraints is automatically generated. We demonstrate the efficacy of this framework on several robotic scenarios in both simulations and experiments with encouraging results.
ROFeb 10, 2012
Temporal Logic Motion Control using Actor-Critic MethodsXu Chu Ding, Jing Wang, Morteza Lahijanian et al.
In this paper, we consider the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov Decision Process (MDP). The robot control problem becomes finding the control policy maximizing the probability of satisfying the temporal logic task on the MDP. For a large environment, obtaining transition probabilities for each state-action pair, as well as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates on sample paths of the robot and optimizes a randomized control policy with respect to a small set of parameters. The transition probabilities are obtained only when needed. Hardware-in-the-loop simulations confirm that convergence of the parameters translates to an approximately optimal policy.