Jyotirmoy V. Deshmukh

RO
h-index29
28papers
351citations
Novelty48%
AI Score54

28 Papers

FLJul 1, 2022
Interactive Learning from Natural Language and Demonstrations using Signal Temporal Logic

Sara Mohammadinejad, Jesse Thomason, Jyotirmoy V. Deshmukh · uw

Natural language is an intuitive way for humans to communicate tasks to a robot. While natural language (NL) is ambiguous, real world tasks and their safety requirements need to be communicated unambiguously. Signal Temporal Logic (STL) is a formal logic that can serve as a versatile, expressive, and unambiguous formal language to describe robotic tasks. On one hand, existing work in using STL for the robotics domain typically requires end-users to express task specifications in STL, a challenge for non-expert users. On the other, translating from NL to STL specifications is currently restricted to specific fragments. In this work, we propose DIALOGUESTL, an interactive approach for learning correct and concise STL formulas from (often) ambiguous NL descriptions. We use a combination of semantic parsing, pre-trained transformer-based language models, and user-in-the-loop clarifications aided by a small number of user demonstrations to predict the best STL formula to encode NL task descriptions. An advantage of mapping NL to STL is that there has been considerable recent work on the use of reinforcement learning (RL) to identify control policies for robots. We show we can use Deep Q-Learning techniques to learn optimal policies from the learned STL specifications. We demonstrate that DIALOGUESTL is efficient, scalable, and robust, and has high accuracy in predicting the correct STL formula with a few number of demonstrations and a few interactions with an oracle user.

SYApr 24
Neuro-Symbolic Acceleration of MILP Motion Planning with Temporal Logic and Chance Constraints

Junyang Cai, Weimin Huang, Brendan Long et al.

Autonomous systems must solve motion planning problems subject to increasingly complex, time-sensitive, and uncertain missions. These problems often involve high-level task specifications, such as temporal logic or chance constraints, which require solving large-scale Mixed-Integer Linear Programs (MILPs). However, existing MILP-based planning methods suffer from high computational cost and limited scalability, hindering their real-time applicability. We propose to use a neuro-symbolic approach to accelerate MILP-based motion planning by leveraging machine learning techniques to guide the solver's symbolic search. Focusing on three representative classes of diverse planning problems - Signal Temporal Logic (STL) specifications, chance constraints formulated via Conformal Predictive Programming (CPP), and Capability Temporal Logic (CaTL) specifications - we demonstrate how graph neural network-based learning methods can guide traditional symbolic MILP solvers in solving challenging planning problems, including branching variable selection and solver parameter configuration. Through extensive experiments, we show that neuro-symbolic search techniques yield scalability gains. Our approach yields substantial improvements across all three classes of planning problems, achieving an average performance gain of about 20% over state-of-the-art solver across key metrics, including runtime and solution quality.

SYSep 17, 2023
Data-Driven Reachability Analysis of Stochastic Dynamical Systems with Conformal Inference

Navid Hashemi, Xin Qin, Lars Lindemann et al.

We consider data-driven reachability analysis of discrete-time stochastic dynamical systems using conformal inference. We assume that we are not provided with a symbolic representation of the stochastic system, but instead have access to a dataset of $K$-step trajectories. The reachability problem is to construct a probabilistic flowpipe such that the probability that a $K$-step trajectory can violate the bounds of the flowpipe does not exceed a user-specified failure probability threshold. The key ideas in this paper are: (1) to learn a surrogate predictor model from data, (2) to perform reachability analysis using the surrogate model, and (3) to quantify the surrogate model's incurred error using conformal inference in order to give probabilistic reachability guarantees. We focus on learning-enabled control systems with complex closed-loop dynamics that are difficult to model symbolically, but where state transition pairs can be queried, e.g., using a simulator. We demonstrate the applicability of our method on examples from the domain of learning-enabled cyber-physical systems.

ROJun 29, 2022
Formalizing and Evaluating Requirements of Perception Systems for Automated Vehicles using Spatio-Temporal Perception Logic

Mohammad Hekmatnejad, Bardh Hoxha, Jyotirmoy V. Deshmukh et al.

Automated vehicles (AV) heavily depend on robust perception systems. Current methods for evaluating vision systems focus mainly on frame-by-frame performance. Such evaluation methods appear to be inadequate in assessing the performance of a perception subsystem when used within an AV. In this paper, we present a logic -- referred to as Spatio-Temporal Perception Logic (STPL) -- which utilizes both spatial and temporal modalities. STPL enables reasoning over perception data using spatial and temporal operators. One major advantage of STPL is that it facilitates basic sanity checks on the functional performance of the perception system, even without ground-truth data in some cases. We identify a fragment of STPL which is efficiently monitorable offline in polynomial time. Finally, we present a range of specifications for AV perception systems to highlight the types of requirements that can be expressed and analyzed through offline monitoring with STPL.

SYOct 14, 2022
Risk-Awareness in Learning Neural Controllers for Temporal Logic Objectives

Navid Hashemi, Xin Qin, Jyotirmoy V. Deshmukh et al.

In this paper, we consider the problem of synthesizing a controller in the presence of uncertainty such that the resulting closed-loop system satisfies certain hard constraints while optimizing certain (soft) performance objectives. We assume that the hard constraints encoding safety or mission-critical task objectives are expressed using Signal Temporal Logic (STL), while performance is quantified using standard cost functions on system trajectories. In order to prioritize the satisfaction of the hard STL constraints, we utilize the framework of control barrier functions (CBFs) and algorithmically obtain CBFs for STL objectives. We assume that the controllers are modeled using neural networks (NNs) and provide an optimization algorithm to learn the optimal parameters for the NN controller that optimize the performance at a user-specified robustness margin for the safety specifications. We use the formalism of risk measures to evaluate the risk incurred by the trade-off between robustness margin of the system and its performance. We demonstrate the efficacy of our approach on well-known difficult examples for nonlinear control such as a quad-rotor and a unicycle, where the mission objectives for each system include hard timing constraints and safety objectives.

ROJul 16, 2024
Statistical Reachability Analysis of Stochastic Cyber-Physical Systems under Distribution Shift

Navid Hashemi, Lars Lindemann, Jyotirmoy V. Deshmukh

Reachability analysis is a popular method to give safety guarantees for stochastic cyber-physical systems (SCPSs) that takes in a symbolic description of the system dynamics and uses set-propagation methods to compute an overapproximation of the set of reachable states over a bounded time horizon. In this paper, we investigate the problem of performing reachability analysis for an SCPS that does not have a symbolic description of the dynamics, but instead is described using a digital twin model that can be simulated to generate system trajectories. An important challenge is that the simulator implicitly models a probability distribution over the set of trajectories of the SCPS; however, it is typical to have a sim2real gap, i.e., the actual distribution of the trajectories in a deployment setting may be shifted from the distribution assumed by the simulator. We thus propose a statistical reachability analysis technique that, given a user-provided threshold $1-ε$, provides a set that guarantees that any reachable state during deployment lies in this set with probability not smaller than this threshold. Our method is based on three main steps: (1) learning a deterministic surrogate model from sampled trajectories, (2) conducting reachability analysis over the surrogate model, and (3) employing {\em robust conformal inference} using an additional set of sampled trajectories to quantify the surrogate model's distribution shift with respect to the deployed SCPS. To counter conservatism in reachable sets, we propose a novel method to train surrogate models that minimizes a quantile loss term (instead of the usual mean squared loss), and a new method that provides tighter guarantees using conformal inference using a normalized surrogate error. We demonstrate the effectiveness of our technique on various case studies.

ROApr 12, 2022
Learning Performance Graphs from Demonstrations via Task-Based Evaluations

Aniruddh G. Puranic, Jyotirmoy V. Deshmukh, Stefanos Nikolaidis

In the learning from demonstration (LfD) paradigm, understanding and evaluating the demonstrated behaviors plays a critical role in extracting control policies for robots. Without this knowledge, a robot may infer incorrect reward functions that lead to undesirable or unsafe control policies. Recent work has proposed an LfD framework where a user provides a set of formal task specifications to guide LfD, to address the challenge of reward shaping. However, in this framework, specifications are manually ordered in a performance graph (a partial order that specifies relative importance between the specifications). The main contribution of this paper is an algorithm to learn the performance graph directly from the user-provided demonstrations, and show that the reward functions generated using the learned performance graph generate similar policies to those from manually specified performance graphs. We perform a user study that shows that priorities specified by users on behaviors in a simulated highway driving domain match the automatically inferred performance graph. This establishes that we can accurately evaluate user demonstrations with respect to task specifications without expert criteria.

MADec 5, 2022
Multi Agent Path Finding using Evolutionary Game Theory

Sheryl Paul, Jyotirmoy V. Deshmukh

In this paper, we consider the problem of path finding for a set of homogeneous and autonomous agents navigating a previously unknown stochastic environment. In our problem setting, each agent attempts to maximize a given utility function while respecting safety properties. Our solution is based on ideas from evolutionary game theory, namely replicating policies that perform well and diminishing ones that do not. We do a comprehensive comparison with related multiagent planning methods, and show that our technique beats state of the art RL algorithms in minimizing path length by nearly 30% in large spaces. We show that our algorithm is computationally faster than deep RL methods by at least an order of magnitude. We also show that it scales better with an increase in the number of agents as compared to other methods, path planning methods in particular. Lastly, we empirically prove that the policies that we learn are evolutionarily stable and thus impervious to invasion by any other policy.

RONov 9, 2023
Signal Temporal Logic-Guided Apprenticeship Learning

Aniruddh G. Puranic, Jyotirmoy V. Deshmukh, Stefanos Nikolaidis

Apprenticeship learning crucially depends on effectively learning rewards, and hence control policies from user demonstrations. Of particular difficulty is the setting where the desired task consists of a number of sub-goals with temporal dependencies. The quality of inferred rewards and hence policies are typically limited by the quality of demonstrations, and poor inference of these can lead to undesirable outcomes. In this letter, we show how temporal logic specifications that describe high level task objectives, are encoded in a graph to define a temporal-based metric that reasons about behaviors of demonstrators and the learner agent to improve the quality of inferred rewards and policies. Through experiments on a diverse set of robot manipulator simulations, we show how our framework overcomes the drawbacks of prior literature by drastically improving the number of demonstrations required to learn a control policy.

SIMar 17
On Online Control of Opinion Dynamics

Sheryl Paul, Leslie Cruz Juarez, Jyotirmoy V. Deshmukh et al.

Networked multi-agent dynamical systems have been used to model how individual opinions evolve over time due to the opinions of other agents in the network. Particularly, such a model has been used to study how a planning agent can be used to steer opinions in a desired direction through repeated, budgeted interventions. In this paper, we consider the problem where individuals' susceptibilities to external influences are unknown. We propose an online algorithm that alternates between estimating this susceptibility parameter, and using the current estimate to drive the opinion to a desired target. We provide conditions that guarantee stability and convergence to the desired target opinion when the planning agent faces budgetary or temporal constraints. Our analysis shows that the key advantage of estimating the susceptibility parameter is that it helps achieve near-optimal convergence to the target opinion given a finite amount of intervention rounds, and, for a given intervention budget, quantifies how close the opinion can get to the desired target.

ROMay 18
Guiding Neuro-Symbolic Scenario Generation with Spatio-Temporal Logic

Lorenzo Bonin, Francesco Giacomarra, Luca Bortolussi et al.

The rapid advancement of autonomous driving (AD) technologies has outpaced the development of robust safety evaluation methods. Conventional testing relies on exposing AD systems to vast numbers of real-world traffic scenes -- a brute-force approach that is prohibitively expensive and statistically ineffective at capturing the rare, safety-critical edge cases essential for validating real-world robustness. To address this fundamental limitation, we introduce STRELGen, a scalable framework for the targeted generation of safety-critical driving scenarios. STRELGen synergistically combines a multi-agent trajectory-generation diffusion model (DM) with Spatio-Temporal Logic (STREL) specifications that encode complex safety and realism properties through a highly interpretable formalism. Crucially, monitoring satisfaction levels of these specifications is differentiable, enabling gradient-based search. At inference time, we optimize directly over the DM latent space to maximize STREL formula satisfaction. The result is efficient generation of highly plausible yet safety-critical multi-agent scenarios that lie within the learned data distribution. STRELGen thus provides a flexible, interpretable, and powerful tool for stress-testing autonomous driving systems, moving beyond the limitations of brute-force data collection.

SYFeb 12, 2024
Conformal Predictive Programming for Chance Constrained Optimization

Yiqi Zhao, Xinyi Yu, Matteo Sesia et al.

We propose conformal predictive programming (CPP), a framework to solve chance constrained optimization problems, i.e., optimization problems with constraints that are functions of random variables. CPP utilizes samples from these random variables along with the quantile lemma - central to conformal prediction - to transform the chance constrained optimization problem into a deterministic problem with a quantile reformulation. CPP inherits a priori guarantees on constraint satisfaction from existing sample average approximation approaches for a class of chance constrained optimization problems, and it provides a posteriori guarantees that are of conditional and marginal nature otherwise. The strength of CPP is that it can easily support different variants of conformal prediction which have been (or will be) proposed within the conformal prediction community. To illustrate this, we present robust CPP to deal with distribution shifts in the random variables and Mondrian CPP to deal with class conditional chance constraints. To enable tractable solutions to the quantile reformulation, we present a mixed integer programming method (CPP-MIP) encoding, a bilevel optimization strategy (CPP-Bilevel), and a sampling-and-discarding optimization strategy (CPP-Discarding). We also extend CPP to deal with joint chance constrained optimization (JCCO). In a series of case studies, we show the validity of the aforementioned approaches, empirically compare CPP-MIP, CPP-Bilevel, as well as CPP-Discarding, and illustrate the advantage of CPP as compared to scenario approach.

CLApr 22, 2025
ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees

Jun Wang, David Smith Sundarsingh, Jyotirmoy V. Deshmukh et al.

Linear Temporal Logic (LTL) has become a prevalent specification language for robotic tasks. To mitigate the significant manual effort and expertise required to define LTL-encoded tasks, several methods have been proposed for translating Natural Language (NL) instructions into LTL formulas, which, however, lack correctness guarantees. To address this, we introduce a new NL-to-LTL translation method, called ConformalNL2LTL, that can achieve user-defined translation success rates over unseen NL commands. Our method constructs LTL formulas iteratively by addressing a sequence of open-vocabulary Question-Answering (QA) problems with LLMs. To enable uncertainty-aware translation, we leverage conformal prediction (CP), a distribution-free uncertainty quantification tool for black-box models. CP enables our method to assess the uncertainty in LLM-generated answers, allowing it to proceed with translation when sufficiently confident and request help otherwise. We provide both theoretical and empirical results demonstrating that ConformalNL2LTL achieves user-specified translation accuracy while minimizing help rates.

ROSep 26, 2025
Self-driving cars: Are we there yet?

Merve Atasever, Zhuochen Liu, Qingpei Li et al.

Autonomous driving remains a highly active research domain that seeks to enable vehicles to perceive dynamic environments, predict the future trajectories of traffic agents such as vehicles, pedestrians, and cyclists and plan safe and efficient future motions. To advance the field, several competitive platforms and benchmarks have been established to provide standardized datasets and evaluation protocols. Among these, leaderboards by the CARLA organization and nuPlan and the Waymo Open Dataset have become leading benchmarks for assessing motion planning algorithms. Each offers a unique dataset and challenging planning problems spanning a wide range of driving scenarios and conditions. In this study, we present a comprehensive comparative analysis of the motion planning methods featured on these three leaderboards. To ensure a fair and unified evaluation, we adopt CARLA leaderboard v2.0 as our common evaluation platform and modify the selected models for compatibility. By highlighting the strengths and weaknesses of current approaches, we identify prevailing trends, common challenges, and suggest potential directions for advancing motion planning research.

MASep 26, 2025
Multi-Agent Path Finding via Offline RL and LLM Collaboration

Merve Atasever, Matthew Hong, Mihir Nitin Kulkarni et al.

Multi-Agent Path Finding (MAPF) poses a significant and challenging problem critical for applications in robotics and logistics, particularly due to its combinatorial complexity and the partial observability inherent in realistic environments. Decentralized reinforcement learning methods commonly encounter two substantial difficulties: first, they often yield self-centered behaviors among agents, resulting in frequent collisions, and second, their reliance on complex communication modules leads to prolonged training times, sometimes spanning weeks. To address these challenges, we propose an efficient decentralized planning framework based on the Decision Transformer (DT), uniquely leveraging offline reinforcement learning to substantially reduce training durations from weeks to mere hours. Crucially, our approach effectively handles long-horizon credit assignment and significantly improves performance in scenarios with sparse and delayed rewards. Furthermore, to overcome adaptability limitations inherent in standard RL methods under dynamic environmental changes, we integrate a large language model (GPT-4o) to dynamically guide agent policies. Extensive experiments in both static and dynamically changing environments demonstrate that our DT-based approach, augmented briefly by GPT-4o, significantly enhances adaptability and performance.

AISep 1, 2025
Conformal Predictive Monitoring for Multi-Modal Scenarios

Francesca Cairoli, Luca Bortolussi, Jyotirmoy V. Deshmukh et al.

We consider the problem of quantitative predictive monitoring (QPM) of stochastic systems, i.e., predicting at runtime the degree of satisfaction of a desired temporal logic property from the current state of the system. Since computational efficiency is key to enable timely intervention against predicted violations, several state-of-the-art QPM approaches rely on fast machine-learning surrogates to provide prediction intervals for the satisfaction values, using conformal inference to offer statistical guarantees. However, these QPM methods suffer when the monitored agent exhibits multi-modal dynamics, whereby certain modes may yield high satisfaction values while others critically violate the property. Existing QPM methods are mode-agnostic and so would yield overly conservative and uninformative intervals that lack meaningful mode-specific satisfaction information. To address this problem, we present GenQPM, a method that leverages deep generative models, specifically score-based diffusion models, to reliably approximate the probabilistic and multi-modal system dynamics without requiring explicit model access. GenQPM employs a mode classifier to partition the predicted trajectories by dynamical mode. For each mode, we then apply conformal inference to produce statistically valid, mode-specific prediction intervals. We demonstrate the effectiveness of GenQPM on a benchmark of agent navigation and autonomous driving tasks, resulting in prediction intervals that are significantly more informative (less conservative) than mode-agnostic baselines.

ROApr 18, 2025
Coordinating Spinal and Limb Dynamics for Enhanced Sprawling Robot Mobility

Merve Atasever, Ali Okhovat, Azhang Nazaripouya et al.

Among vertebrates, salamanders, with their unique ability to transition between walking and swimming gaits, highlight the role of spinal mobility in locomotion. A flexible spine enables undulation of the body through a wavelike motion along the spine, aiding navigation over uneven terrains and obstacles. Yet environmental uncertainties, such as surface irregularities and variations in friction, can significantly disrupt body-limb coordination and cause discrepancies between predictions from mathematical models and real-world outcomes. Addressing this challenge requires the development of sophisticated control strategies capable of dynamically adapting to uncertain conditions while maintaining efficient locomotion. Deep reinforcement learning (DRL) offers a promising framework for handling non-deterministic environments and enabling robotic systems to adapt effectively and perform robustly under challenging conditions. In this study, we comparatively examine learning-based control strategies and biologically inspired gait design methods on a salamander-like robot.

LGOct 22, 2024
Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts

Sheryl Paul, Jyotirmoy V. Deshmukh

Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the environment experiences drastic distribution shifts, the optimal policy obtained in the trained environment may be sub-optimal or may entirely fail in helping find goal-reaching paths for the agent. Approaches like domain randomization and robust RL can provide robust policies, but typically assume minor (bounded) distribution shifts. For substantial distribution shifts, retraining (either with a warm-start policy or from scratch) is an alternative approach. In this paper, we develop a novel approach called {\em Evolutionary Robust Policy Optimization} (ERPO), an adaptive re-training algorithm inspired by evolutionary game theory (EGT). ERPO learns an optimal policy for the shifted environment iteratively using a temperature parameter that controls the trade off between exploration and adherence to the old optimal policy. The policy update itself is an instantiation of the replicator dynamics used in EGT. We show that under fairly common sparsity assumptions on rewards in such environments, ERPO converges to the optimal policy in the shifted environment. We empirically demonstrate that for path finding tasks in a number of environments, ERPO outperforms several popular RL and deep RL algorithms (PPO, A3C, DQN) in many scenarios and popular environments. This includes scenarios where the RL algorithms are allowed to train from scratch in the new environment, when they are retrained on the new environment, or when they are used in conjunction with domain randomization. ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.

AIFeb 4, 2022
Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives

Anand Balakrishnan, Stefan Jakšić, Edgar A. Aguilar et al.

Reinforcement learning (RL) is a popular approach for robotic path planning in uncertain environments. However, the control policies trained for an RL agent crucially depend on user-defined, state-based reward functions. Poorly designed rewards can lead to policies that do get maximal rewards but fail to satisfy desired task objectives or are unsafe. There are several examples of the use of formal languages such as temporal logics and automata to specify high-level task specifications for robots (in lieu of Markovian rewards). Recent efforts have focused on inferring state-based rewards from formal specifications; here, the goal is to provide (probabilistic) guarantees that the policy learned using RL (with the inferred rewards) satisfies the high-level formal specification. A key drawback of several of these techniques is that the rewards that they infer are sparse: the agent receives positive rewards only upon completion of the task and no rewards otherwise. This naturally leads to poor convergence properties and high variance during RL. In this work, we propose using formal specifications in the form of symbolic automata: these serve as a generalization of both bounded-time temporal logic-based specifications as well as automata. Furthermore, our use of symbolic automata allows us to define non-sparse potential-based rewards which empirically shape the reward surface, leading to better convergence during RL. We also show that our potential-based rewarding strategy still allows us to obtain the policy that maximizes the satisfaction of the given specification.

LGJul 29, 2021
Non-Markovian Reinforcement Learning using Fractional Dynamics

Gaurav Gupta, Chenzhong Yin, Jyotirmoy V. Deshmukh et al.

Reinforcement learning (RL) is a technique to learn the control policy for an agent that interacts with a stochastic environment. In any given state, the agent takes some action, and the environment determines the probability distribution over the next state as well as gives the agent some reward. Most RL algorithms typically assume that the environment satisfies Markov assumptions (i.e. the probability distribution over the next state depends only on the current state). In this paper, we propose a model-based RL technique for a system that has non-Markovian dynamics. Such environments are common in many real-world applications such as in human physiology, biological systems, material science, and population dynamics. Model-based RL (MBRL) techniques typically try to simultaneously learn a model of the environment from the data, as well as try to identify an optimal policy for the learned model. We propose a technique where the non-Markovianity of the system is modeled through a fractional dynamical system. We show that we can quantify the difference in the performance of an MBRL algorithm that uses bounded horizon model predictive control from the optimal policy. Finally, we demonstrate our proposed framework on a pharmacokinetic model of human blood glucose dynamics and show that our fractional models can capture distant correlations on real-world datasets.

ROFeb 15, 2021
Learning from Demonstrations using Signal Temporal Logic

Aniruddh G. Puranic, Jyotirmoy V. Deshmukh, Stefanos Nikolaidis

Learning-from-demonstrations is an emerging paradigm to obtain effective robot control policies for complex tasks via reinforcement learning without the need to explicitly design reward functions. However, it is susceptible to imperfections in demonstrations and also raises concerns of safety and interpretability in the learned control policies. To address these issues, we use Signal Temporal Logic to evaluate and rank the quality of demonstrations. Temporal logic-based specifications allow us to create non-Markovian rewards, and also define interesting causal dependencies between tasks such as sequential task specifications. We validate our approach through experiments on discrete-world and OpenAI Gym environments, and show that our approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.

RONov 10, 2020
Model-based Reinforcement Learning from Signal Temporal Logic Specifications

Parv Kapoor, Anand Balakrishnan, Jyotirmoy V. Deshmukh

Techniques based on Reinforcement Learning (RL) are increasingly being used to design control policies for robotic systems. RL fundamentally relies on state-based reward functions to encode desired behavior of the robot and bad reward functions are prone to exploitation by the learning agent, leading to behavior that is undesirable in the best case and critically dangerous in the worst. On the other hand, designing good reward functions for complex tasks is a challenging problem. In this paper, we propose expressing desired high-level robot behavior using a formal specification language known as Signal Temporal Logic (STL) as an alternative to reward/cost functions. We use STL specifications in conjunction with model-based learning to design model predictive controllers that try to optimize the satisfaction of the STL specification over a finite time horizon. The proposed algorithm is empirically evaluated on simulations of robotic system such as a pick-and-place robotic arm, and adaptive cruise control for autonomous vehicles.

LGJul 20, 2020
DiffRNN: Differential Verification of Recurrent Neural Networks

Sara Mohammadinejad, Brandon Paulsen, Chao Wang et al.

Recurrent neural networks (RNNs) such as Long Short Term Memory (LSTM) networks have become popular in a variety of applications such as image processing, data classification, speech recognition, and as controllers in autonomous systems. In practical settings, there is often a need to deploy such RNNs on resource-constrained platforms such as mobile phones or embedded devices. As the memory footprint and energy consumption of such components become a bottleneck, there is interest in compressing and optimizing such networks using a range of heuristic techniques. However, these techniques do not guarantee the safety of the optimized network, e.g., against adversarial inputs, or equivalence of the optimized and original networks. To address this problem, we propose DIFFRNN, the first differential verification method for RNNs to certify the equivalence of two structurally similar neural networks. Existing work on differential verification for ReLUbased feed-forward neural networks does not apply to RNNs where nonlinear activation functions such as Sigmoid and Tanh cannot be avoided. RNNs also pose unique challenges such as handling sequential inputs, complex feedback structures, and interactions between the gates and states. In DIFFRNN, we overcome these challenges by bounding nonlinear activation functions with linear constraints and then solving constrained optimization problems to compute tight bounding boxes on nonlinear surfaces in a high-dimensional space. The soundness of these bounding boxes is then proved using the dReal SMT solver. We demonstrate the practical efficacy of our technique on a variety of benchmarks and show that DIFFRNN outperforms state-of-the-art RNN verification tools such as POPQORN.

LGMay 18, 2020
Mining Environment Assumptions for Cyber-Physical System Models

Sara Mohammadinejad, Jyotirmoy V. Deshmukh, Aniruddh G. Puranic

Many complex cyber-physical systems can be modeled as heterogeneous components interacting with each other in real-time. We assume that the correctness of each component can be specified as a requirement satisfied by the output signals produced by the component, and that such an output guarantee is expressed in a real-time temporal logic such as Signal Temporal Logic (STL). In this paper, we hypothesize that a large subset of input signals for which the corresponding output signals satisfy the output requirement can also be compactly described using an STL formula that we call the environment assumption. We propose an algorithm to mine such an environment assumption using a supervised learning technique. Essentially, our algorithm treats the environment assumption as a classifier that labels input signals as good if the corresponding output signal satisfies the output requirement, and as bad otherwise. Our learning method simultaneously learns the structure of the STL formula as well as the values of the numeric constants appearing in the formula. To achieve this, we combine a procedure to systematically enumerate candidate Parametric STL (PSTL) formulas, with a decision-tree based approach to learn parameter values. We demonstrate experimental results on real world data from several domains including transportation and health care.

LGJul 24, 2019
Interpretable Classification of Time-Series Data using Efficient Enumerative Techniques

Sara Mohammadinejad, Jyotirmoy V. Deshmukh, Aniruddh G. Puranic et al.

Cyber-physical system applications such as autonomous vehicles, wearable devices, and avionic systems generate a large volume of time-series data. Designers often look for tools to help classify and categorize the data. Traditional machine learning techniques for time-series data offer several solutions to solve these problems; however, the artifacts trained by these algorithms often lack interpretability. On the other hand, temporal logics, such as Signal Temporal Logic (STL) have been successfully used in the formal methods community as specifications of time-series behaviors. In this work, we propose a new technique to automatically learn temporal logic formulae that are able to cluster and classify real-valued time-series data. Previous work on learning STL formulas from data either assumes a formula-template to be given by the user, or assumes some special fragment of STL that enables exploring the formula structure in a systematic fashion. In our technique, we relax these assumptions, and provide a way to systematically explore the space of all STL formulas. As the space of all STL formulas is very large, and contains many semantically equivalent formulas, we suggest a technique to heuristically prune the space of formulas considered. Finally, we illustrate our technique on various case studies from the automotive, transportation and healthcare domain.

SYApr 11, 2018
Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems

Cumhur Erkan Tuncali, James Kapinski, Hisahiro Ito et al.

We present a simulation-based approach for generating barrier certificate functions for safety verification of cyber-physical systems (CPS) that contain neural network-based controllers. A linear programming solver is utilized to find a candidate generator function from a set of simulation traces obtained by randomly selecting initial states for the CPS model. A level set of the generator function is then selected to act as a barrier certificate for the system, meaning it demonstrates that no unsafe system states are reachable from a given set of initial states. The barrier certificate properties are verified with an SMT solver. This approach is demonstrated on a case study in which a Dubins car model of an autonomous vehicle is controlled by a neural network to follow a given path.

LGFeb 24, 2018
Time Series Learning using Monotonic Logical Properties

Marcell Vazquez-Chanlatte, Shromona Ghosh, Jyotirmoy V. Deshmukh et al.

Cyber-physical systems of today are generating large volumes of time-series data. As manual inspection of such data is not tractable, the need for learning methods to help discover logical structure in the data has increased. We propose a logic-based framework that allows domain-specific knowledge to be embedded into formulas in a parametric logical specification over time-series data. The key idea is to then map a time series to a surface in the parameter space of the formula. Given this mapping, we identify the Hausdorff distance between boundaries as a natural distance metric between two time-series data under the lens of the parametric specification. This enables embedding non-trivial domain-specific knowledge into the distance metric and then using off-the-shelf machine learning tools to label the data. After labeling the data, we demonstrate how to extract a logical specification for each label. Finally, we showcase our technique on real world traffic data to learn classifiers/monitors for slow-downs and traffic jams.

LGDec 22, 2016
Logic-based Clustering and Learning for Time-Series Data

Marcell Vazquez-Chanlatte, Jyotirmoy V. Deshmukh, Xiaoqing Jin et al.

To effectively analyze and design cyberphysical systems (CPS), designers today have to combat the data deluge problem, i.e., the burden of processing intractably large amounts of data produced by complex models and experiments. In this work, we utilize monotonic Parametric Signal Temporal Logic (PSTL) to design features for unsupervised classification of time series data. This enables using off-the-shelf machine learning tools to automatically cluster similar traces with respect to a given PSTL formula. We demonstrate how this technique produces interpretable formulas that are amenable to analysis and understanding using a few representative examples. We illustrate this with case studies related to automotive engine testing, highway traffic analysis, and auto-grading massively open online courses.