FLJun 29, 2023Code
Attack-Resilient Supervisory Control of Discrete-Event Systems: A Finite-State Transducer ApproachYu Wang, Alper Kamil Bozkurt, Nathan Smith et al.
Resilience to sensor and actuator attacks is a major concern in the supervisory control of discrete events in cyber-physical systems (CPS). In this work, we propose a new framework to design supervisors for CPS under attacks using finite-state transducers (FSTs) to model the effects of the discrete events. FSTs can capture a general class of regular-rewriting attacks in which an attacker can nondeterministically rewrite sensing/actuation events according to a given regular relation. These include common insertion, deletion, event-wise replacement, and finite-memory replay attacks. We propose new theorems and algorithms with polynomial complexity to design resilient supervisors against these attacks. We also develop an open-source tool in Python based on the results and illustrate its applicability through a case study
54.8IVJun 3
Scaling Datasets for Multi-Sensor, Multi-Agent, and Multi-Domain Learning in Autonomous SystemsR. Spencer Hallyburton, David Hunt, Miroslav Pajic
Existing datasets cannot support large-scale learning in multi-agent, multi-sensor, or multi-domain autonomy, where diversity and coordination are essential. We present a modular dataset generation pipeline that creates terabyte-scale, ground-truth-labeled data for ground, aerial, and infrastructure-based systems using the AVstack framework and CARLA simulator. Supporting single- and multi-agent configurations with flexible sensor suites, the pipeline enables controllable experimentation across challenging conditions. Representative perception and fusion studies show how generated data can support application-specific training and collaborative autonomy.
LGFeb 5, 2023
Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease TreatmentQitong Gao, Stephen L. Schimdt, Afsana Chowdhury et al.
Deep brain stimulation (DBS) has shown great promise toward treating motor symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to the Basal Ganglia (BG) region of the brain. However, DBS devices approved by the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS (cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces battery lifetime of the device, cannot adapt treatment dynamically for activity, and may cause significant side-effects (e.g., gait impairment). In this work, we introduce an offline reinforcement learning (RL) framework, allowing the use of past clinical data to train an RL policy to adjust the stimulation amplitude in real time, with the goal of reducing energy use while maintaining the same level of treatment (i.e., control) efficacy as cDBS. Moreover, clinical protocols require the safety and performance of such RL controllers to be demonstrated ahead of deployments in patients. Thus, we also introduce an offline policy evaluation (OPE) method to estimate the performance of RL policies using historical data, before deploying them on patients. We evaluated our framework on four PD patients equipped with the RC+S DBS system, employing the RL controllers during monthly clinical visits, with the overall control efficacy evaluated by severity of symptoms (i.e., bradykinesia and tremor), changes in PD biomakers (i.e., local field potentials), and patient ratings. The results from clinical experiments show that our RL-based controller maintains the same level of control efficacy as cDBS, but with significantly reduced stimulation energy. Further, the OPE method is shown effective in accurately estimating and ranking the expected returns of RL controllers.
LGJun 12, 2023
Robust Reinforcement Learning through Efficient Adversarial HerdingJuncheng Dong, Hao-Lun Hsu, Qitong Gao et al.
Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to potential disturbances. Adversarial training using a two-player max-min game has been proven effective in enhancing the robustness of RL agents. In this work, we extend the two-player game by introducing an adversarial herd, which involves a group of adversaries, in order to address ($\textit{i}$) the difficulty of the inner optimization problem, and ($\textit{ii}$) the potential over pessimism caused by the selection of a candidate adversary set that may include unlikely scenarios. We first prove that adversarial herds can efficiently approximate the inner optimization problem. Then we address the second issue by replacing the worst-case performance in the inner optimization with the average performance over the worst-$k$ adversaries. We evaluate the proposed method on multiple MuJoCo environments. Experimental results demonstrate that our approach consistently generates more robust policies.
LGJan 28, 2023
Variational Latent Branching Model for Off-Policy EvaluationQitong Gao, Ge Gao, Min Chi et al.
Model-based methods have recently shown great potential for off-policy evaluation (OPE); offline trajectories induced by behavioral policies are fitted to transitions of Markov decision processes (MDPs), which are used to rollout simulated trajectories and estimate the performance of policies. Model-based OPE methods face two key challenges. First, as offline trajectories are usually fixed, they tend to cover limited state and action space. Second, the performance of model-based methods can be sensitive to the initialization of their parameters. In this work, we propose the variational latent branching model (VLBM) to learn the transition function of MDPs by formulating the environmental dynamics as a compact latent space, from which the next states and rewards are then sampled. Specifically, VLBM leverages and extends the variational inference framework with the recurrent state alignment (RSA), which is designed to capture as much information underlying the limited training data, by smoothing out the information flow between the variational (encoding) and generative (decoding) part of VLBM. Moreover, we also introduce the branching architecture to improve the model's robustness against randomly initialized model weights. The effectiveness of the VLBM is evaluated on the deep OPE (DOPE) benchmark, from which the training trajectories are designed to result in varied coverage of the state-action space. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.
MEJun 20, 2023
Treatment Effects in Extreme RegimesAhmed Aloui, Ali Hasan, Yuting Ng et al.
Understanding treatment effects in extreme regimes is important for characterizing risks associated with different interventions. This is hindered by the unavailability of counterfactual outcomes and the rarity and difficulty of collecting extreme data in practice. To address this issue, we propose a new framework based on extreme value theory for estimating treatment effects in extreme regimes. We quantify these effects using variations in tail decay rates of potential outcomes in the presence and absence of treatments. We establish algorithms for calculating these quantities and develop related theoretical results. We demonstrate the efficacy of our approach on various standard synthetic and semi-synthetic datasets.
MLMay 25, 2022
Transportation-Inequalities, Lyapunov Stability and Sampling for Dynamical Systems on Continuous State SpaceMuhammad Abdullah Naeem, Miroslav Pajic
We study the concentration phenomenon for discrete-time random dynamical systems with an unbounded state space. We develop a heuristic approach towards obtaining exponential concentration inequalities for dynamical systems using an entirely functional analytic framework. We also show that existence of exponential-type Lyapunov function, compared to the purely deterministic setting, not only implies stability but also exponential concentration inequalities for sampling from the stationary distribution, via \emph{transport-entropy inequality} (T-E). These results have significant impact in \emph{reinforcement learning} (RL) and \emph{controls}, leading to exponential concentration inequalities even for unbounded observables, while neither assuming reversibility nor exact knowledge of random dynamical system (assumptions at heart of concentration inequalities in statistical mechanics and Markov diffusion processes).
LGOct 11, 2023
Off-Policy Evaluation for Human FeedbackQitong Gao, Ge Gao, Juncheng Dong et al.
Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, as well as in a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.
STOct 16, 2023
From Spectral Theorem to Statistical Independence with Application to System IdentificationMuhammad Abdullah Naeem, Amir Khazraei, Miroslav Pajic
High dimensional random dynamical systems are ubiquitous, including -- but not limited to -- cyber-physical systems, daily return on different stocks of S&P 1500 and velocity profile of interacting particle systems around McKeanVlasov limit. Mathematically, underlying phenomenon can be captured via a stable $n$-dimensional linear transformation `$A$' and additive randomness. System identification aims at extracting useful information about underlying dynamical system, given a length $N$ trajectory from it (corresponds to an $n \times N$ dimensional data matrix). We use spectral theorem for non-Hermitian operators to show that spatio-temperal correlations are dictated by the discrepancy between algebraic and geometric multiplicity of distinct eigenvalues corresponding to state transition matrix. Small discrepancies imply that original trajectory essentially comprises of multiple lower dimensional random dynamical systems living on $A$ invariant subspaces and are statistically independent of each other. In the process, we provide first quantitative handle on decay rate of finite powers of state transition matrix $\|A^{k}\|$ . It is shown that when a stable dynamical system has only one distinct eigenvalue and discrepancy of $n-1$: $\|A\|$ has a dependence on $n$, resulting dynamics are spatially inseparable and consequently there exist at least one row with covariates of typical size $Θ\big(\sqrt{N-n+1}$ $e^{n}\big)$ i.e., even under stability assumption, covariates can suffer from curse of dimensionality. In the light of these findings we set the stage for non-asymptotic error analysis in estimation of state transition matrix $A$ via least squares regression on observed trajectory by showing that element-wise error is essentially a variant of well-know Littlewood-Offord problem.
24.0ROApr 17
Semantic Area Graph Reasoning for Multi-Robot Language-Guided SearchRuiyang Wang, Hao-Lun Hsu, Jiwoo Kim et al.
Coordinating multi-robot systems (MRS) to search in unknown environments is particularly challenging for tasks that require semantic reasoning beyond geometric exploration. Classical coordination strategies rely on frontier coverage or information gain and cannot incorporate high-level task intent, such as searching for objects associated with specific room types. We propose \textit{Semantic Area Graph Reasoning} (SAGR), a hierarchical framework that enables Large Language Models (LLMs) to coordinate multi-robot exploration and semantic search through a structured semantic-topological abstraction of the environment. SAGR incrementally constructs a semantic area graph from a semantic occupancy map, encoding room instances, connectivity, frontier availability, and robot states into a compact task-relevant representation for LLM reasoning. The LLM performs high-level semantic room assignment based on spatial structure and task context, while deterministic frontier planning and local navigation handle geometric execution within assigned rooms. Experiments on the Habitat-Matterport3D dataset across 100 scenarios show that SAGR remains competitive with state-of-the-art exploration methods while consistently improving semantic target search efficiency, with up to 18.8\% in large environments. These results highlight the value of structured semantic abstractions as an effective interface between LLM-based reasoning and multi-robot coordination in complex indoor environments.
LGFeb 26
NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight SpacesJiwoo Kim, Swarajh Mehta, Hao-Lun Hsu et al.
Generative modeling of neural network parameters is often tied to architectures because standard parameter representations rely on known weight-matrix dimensions. Generation is further complicated by permutation symmetries that allow networks to model similar input-output functions while having widely different, unaligned parameterizations. In this work, we introduce Neural Network Diffusion Transformers (NNiTs), which generate weights in a width-agnostic manner by tokenizing weight matrices into patches and modeling them as locally structured fields. We establish that Graph HyperNetworks (GHNs) with a convolutional neural network (CNN) decoder structurally align the weight space, creating the local correlation necessary for patch-based processing. Focusing on MLPs, where permutation symmetry is especially apparent, NNiT generates fully functional networks across a range of architectures. Our approach jointly models discrete architecture tokens and continuous weight patches within a single sequence model. On ManiSkill3 robotics tasks, NNiT achieves >85% success on architecture topologies unseen during training, while baseline approaches fail to generalize.
LGDec 7, 2022
Concentration Phenomenon for Random Dynamical Systems: An Operator Theoretic ApproachMuhammad Abdullah Naeem, Miroslav Pajic
Via operator theoretic methods, we formalize the concentration phenomenon for a given observable `$r$' of a discrete time Markov chain with `$μ_π$' as invariant ergodic measure, possibly having support on an unbounded state space. The main contribution of this paper is circumventing tedious probabilistic methods with a study of a composition of the Markov transition operator $P$ followed by a multiplication operator defined by $e^{r}$. It turns out that even if the observable/ reward function is unbounded, but for some for some $q>2$, $\|e^{r}\|_{q \rightarrow 2} \propto \exp\big(μ_π(r) +\frac{2q}{q-2}\big) $ and $P$ is hyperbounded with norm control $\|P\|_{2 \rightarrow q }< e^{\frac{1}{2}[\frac{1}{2}-\frac{1}{q}]}$, sharp non-asymptotic concentration bounds follow. \emph{Transport-entropy} inequality ensures the aforementioned upper bound on multiplication operator for all $q>2$. The role of \emph{reversibility} in concentration phenomenon is demystified. These results are particularly useful for the reinforcement learning and controls communities as they allow for concentration inequalities w.r.t standard unbounded obersvables/reward functions where exact knowledge of the system is not available, let alone the reversibility of stationary measure.
32.5LGMay 6
Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement LearningAlper Kamil Bozkurt, Xiaoan Xu, Shangtong Zhang et al.
In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can be unreliable, making it risky to deploy a policy based solely on those estimates, whereas OE may identify a viable policy with substantial online interaction, which could have been used for fine-tuning. Second--and more importantly--it is also often not possible to determine a priori whether a pretrained policy will improve with post-deployment fine-tuning, especially in non-stationary environments. As a result, procedures committing to a single deployed policy are impractical in many real-world settings. Moreover, a naive remedy that exhaustively fine-tunes all candidates would violate interaction budget constraints and is likewise infeasible. In this paper, we propose a novel adaptive approach for policy selection and fine-tuning under online interaction budgets in O2O-RL. Following the standard pipeline, we first train a set of candidate policies with different offline RL algorithms and hyperparameters; we then perform OPE to obtain initial performance estimates. We next adaptively select and fine-tune the policies based on their predicted performance via an upper-confidence-bound approach thereby making efficient use of online interactions. We demonstrate that our approach improves upon O2O-RL baselines with various benchmarks.
LGApr 16, 2024
Randomized Exploration in Cooperative Multi-Agent Reinforcement LearningHao-Lun Hsu, Weixin Wang, Miroslav Pajic et al.
We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy, respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$ regret bound with communication complexity $\widetilde{\mathcal{O}}(dHM^2)$, where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the number of agents, and $K$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i.e., $N$-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models. Additionally, we establish a connection between our unified framework and the practical application of federated learning.
AIApr 7, 2024
On the Uniqueness of Solution for the Bellman Equation of LTL ObjectivesZetong Xuan, Alper Kamil Bozkurt, Miroslav Pajic et al.
Surrogate rewards for linear temporal logic (LTL) objectives are commonly utilized in planning problems for LTL objectives. In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as reinforcement learning. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate with an example that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to inaccurate evaluation of the expected return. We then propose a condition for the Bellman equation to have the expected return as the unique solution, requiring the solutions for states inside a rejecting bottom strongly connected component (BSCC) to be 0. We prove this condition is sufficient by showing that the solutions for the states with discounting can be separated from those for the states without discounting under this condition
SEFeb 17, 2025
NeuroStrata: Harnessing Neurosymbolic Paradigms for Improved Design, Testability, and Verifiability of Autonomous CPSXi Zheng, Ziyang Li, Ivan Ruchkin et al.
Autonomous cyber-physical systems (CPSs) leverage AI for perception, planning, and control but face trust and safety certification challenges due to inherent uncertainties. The neurosymbolic paradigm replaces stochastic layers with interpretable symbolic AI, enabling determinism. While promising, challenges like multisensor fusion, adaptability, and verification remain. This paper introduces NeuroStrata, a neurosymbolic framework to enhance the testing and verification of autonomous CPS. We outline its key components, present early results, and detail future plans.
AIMay 27, 2025
Assured Autonomy with Neuro-Symbolic PerceptionR. Spencer Hallyburton, Miroslav Pajic
Many state-of-the-art AI models deployed in cyber-physical systems (CPS), while highly accurate, are simply pattern-matchers.~With limited security guarantees, there are concerns for their reliability in safety-critical and contested domains. To advance assured AI, we advocate for a paradigm shift that imbues data-driven perception models with symbolic structure, inspired by a human's ability to reason over low-level features and high-level context. We propose a neuro-symbolic paradigm for perception (NeuSPaPer) and illustrate how joint object detection and scene graph generation (SGG) yields deep scene understanding.~Powered by foundation models for offline knowledge extraction and specialized SGG algorithms for real-time deployment, we design a framework leveraging structured relational graphs that ensures the integrity of situational awareness in autonomy. Using physics-based simulators and real-world datasets, we demonstrate how SGG bridges the gap between low-level sensor perception and high-level reasoning, establishing a foundation for resilient, context-aware AI and advancing trusted autonomy in CPS.
CVMar 10, 2025
Probabilistic Segmentation for Robust Field of View EstimationR. Spencer Hallyburton, David Hunt, Yiwei He et al.
Attacks on sensing and perception threaten the safe deployment of autonomous vehicles (AVs). Security-aware sensor fusion helps mitigate threats but requires accurate field of view (FOV) estimation which has not been evaluated autonomy. To address this gap, we adapt classical computer graphics algorithms to develop the first autonomy-relevant FOV estimators and create the first datasets with ground truth FOV labels. Unfortunately, we find that these approaches are themselves highly vulnerable to attacks on sensing. To improve robustness of FOV estimation against attacks, we propose a learning-based segmentation model that captures FOV features, integrates Monte Carlo dropout (MCD) for uncertainty quantification, and performs anomaly detection on confidence maps. We illustrate through comprehensive evaluations attack resistance and strong generalization across environments. Architecture trade studies demonstrate the model is feasible for real-time deployment in multiple applications.
ROSep 30, 2025
LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and SearchRuiyang Wang, Hao-Lun Hsu, David Hunt et al.
Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
ROJul 1, 2025
RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground VehiclesDavid Hunt, Shaocheng Luo, Spencer Hallyburton et al.
Low-cost indoor mobile robots have gained popularity with the increasing adoption of automation in homes and commercial spaces. However, existing lidar and camera-based solutions have limitations such as poor performance in visually obscured environments, high computational overhead for data processing, and high costs for lidars. In contrast, mmWave radar sensors offer a cost-effective and lightweight alternative, providing accurate ranging regardless of visibility. However, existing radar-based localization suffers from sparse point cloud generation, noise, and false detections. Thus, in this work, we introduce RaGNNarok, a real-time, lightweight, and generalizable graph neural network (GNN)-based framework to enhance radar point clouds, even in complex and dynamic environments. With an inference time of just 7.3 ms on the low-cost Raspberry Pi 5, RaGNNarok runs efficiently even on such resource-constrained devices, requiring no additional computational resources. We evaluate its performance across key tasks, including localization, SLAM, and autonomous navigation, in three different environments. Our results demonstrate strong reliability and generalizability, making RaGNNarok a robust solution for low-cost indoor mobile robots.
LGOct 26, 2024
Off-Policy Selection for Initiating Human-Centric Experimental DesignGe Gao, Xi Yang, Qitong Gao et al.
In human-centric tasks such as healthcare and education, the heterogeneity among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often overlook the heterogeneity among participants. Our work is centered on resolving a pivotal challenge in human-centric systems (HCSs): how to select a policy to deploy when a new participant joining the cohort, without having access to any prior offline data collected over the participant? We introduce First-Glance Off-Policy Selection (FPS), a novel approach that systematically addresses participant heterogeneity through sub-group segmentation and tailored OPS criteria to each sub-group. By grouping individuals with similar traits, FPS facilitates personalized policy selection aligned with unique characteristics of each participant or group of participants. FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes.
LGMar 11, 2024
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease TreatmentHao-Lun Hsu, Qitong Gao, Miroslav Pajic
Deep Brain Stimulation (DBS) stands as an effective intervention for alleviating the motor symptoms of Parkinson's disease (PD). Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i.e., continuous DBS (cDBS). However, they in general suffer from energy inefficiency and side effects, such as speech impairment. Recent research has focused on adaptive DBS (aDBS) to resolve the limitations of cDBS. Specifically, reinforcement learning (RL) based approaches have been developed to adapt the frequencies of the stimuli in order to achieve both energy efficiency and treatment efficacy. However, RL approaches in general require significant amount of training data and computational resources, making it intractable to integrate RL policies into real-time embedded systems as needed in aDBS. In contrast, contextual multi-armed bandits (CMAB) in general lead to better sample efficiency compared to RL. In this study, we propose a CMAB solution for aDBS. Specifically, we define the context as the signals capturing irregular neuronal firing activities in the BG regions (i.e., beta-band power spectral density), while each arm signifies the (discretized) pulse frequency of the stimulation. Moreover, an ε-exploring strategy is introduced on top of the classic Thompson sampling method, leading to an algorithm called ε-Neural Thompson sampling (ε-NeuralTS), such that the learned CMAB policy can better balance exploration and exploitation of the BG environment. The ε-NeuralTS algorithm is evaluated using a computation BG model that captures the neuronal activities in PD patients' brains. The results show that our method outperforms both existing cDBS methods and CMAB baselines.
STDec 10, 2023
Spectral Statistics of the Sample Covariance Matrix for High Dimensional Linear GaussiansMuhammad Abdullah Naeem, Miroslav Pajic
Performance of ordinary least squares(OLS) method for the \emph{estimation of high dimensional stable state transition matrix} $A$(i.e., spectral radius $ρ(A)<1$) from a single noisy observed trajectory of the linear time invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system $X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation} x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim N(0,I_{n}), \end{equation} heavily rely on negative moments of the sample covariance matrix: $(X_{-}X_{-}^{*})=\sum_{i=0}^{N-1}x_{i}x_{i}^{*}$ and singular values of $EX_{-}^{*}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots, w_{N-1}]$. Negative moments requires sharp estimates on all the eigenvalues $λ_{1}\big(X_{-}X_{-}^{*}\big) \geq \ldots \geq λ_{n}\big(X_{-}X_{-}^{*}\big) \geq 0$. Leveraging upon recent results on spectral theorem for non-Hermitian operators in \cite{naeem2023spectral}, along with concentration of measure phenomenon and perturbation theory(Gershgorins' and Cauchys' interlacing theorem) we show that only when $A=A^{*}$, typical order of $λ_{j}\big(X_{-}X_{-}^{*}\big) \in \big[N-n\sqrt{N}, N+n\sqrt{N}\big]$ for all $j \in [n]$. However, in \emph{high dimensions} when $A$ has only one distinct eigenvalue $λ$ with geometric multiplicity of one, then as soon as eigenvalue leaves \emph{complex half unit disc}, largest eigenvalue suffers from curse of dimensionality: $λ_{1}\big(X_{-}X_{-}^{*}\big)=Ω\big( \lfloor\frac{N}{n}\rfloor e^{α_λn} \big)$, while smallest eigenvalue $λ_{n}\big(X_{-}X_{-}^{*}\big) \in (0, N+\sqrt{N}]$. Consequently, OLS estimator incurs a \emph{phase transition} and becomes \emph{transient: increasing iteration only worsens estimation error}, all of this happening when the dynamics are generated from stable systems.
LGJul 5, 2021
Gradient Importance Learning for Incomplete ObservationsQitong Gao, Dong Wang, Joshua D. Amason et al.
Though recent works have developed methods that can generate estimates (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks such as classification. This is particularly true if the data have large missingness rates or a small sample size. More importantly, the imputation error could be propagated into the prediction step that follows, which may constrain the capabilities of the prediction model. In this work, we introduce the gradient importance learning (GIL) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train these models via back-propagation. This allows the model to exploit the underlying information behind missingness patterns. We test the approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
CRJun 13, 2021
Security Analysis of Camera-LiDAR Fusion Against Black-Box Attacks on Autonomous VehiclesR. Spencer Hallyburton, Yupei Liu, Yulong Cao et al.
To enable safe and reliable decision-making, autonomous vehicles (AVs) feed sensor data to perception algorithms to understand the environment. Sensor fusion with multi-frame tracking is becoming increasingly popular for detecting 3D objects. Thus, in this work, we perform an analysis of camera-LiDAR fusion, in the AV context, under LiDAR spoofing attacks. Recently, LiDAR-only perception was shown vulnerable to LiDAR spoofing attacks; however, we demonstrate these attacks are not capable of disrupting camera-LiDAR fusion. We then define a novel, context-aware attack: frustum attack, and show that out of 8 widely used perception algorithms - across 3 architectures of LiDAR-only and 3 architectures of camera-LiDAR fusion - all are significantly vulnerable to the frustum attack. In addition, we demonstrate that the frustum attack is stealthy to existing defenses against LiDAR spoofing as it preserves consistencies between camera and LiDAR semantics. Finally, we show that the frustum attack can be exercised consistently over time to form stealthy longitudinal attack sequences, compromising the tracking module and creating adverse outcomes on end-to-end AV control.
ROApr 4, 2021
Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision ProcessesYu Wang, Alper Kamil Bozkurt, Miroslav Pajic
This paper proposes a reinforcement learning method for controller synthesis of autonomous systems in unknown and partially-observable environments with subjective time-dependent safety constraints. Mathematically, we model the system dynamics by a partially-observable Markov decision process (POMDP) with unknown transition/observation probabilities. The time-dependent safety constraint is captured by iLTL, a variation of linear temporal logic for state distributions. Our Reinforcement learning method first constructs the belief MDP of the POMDP, capturing the time evolution of estimated state distributions. Then, by building the product belief MDP of the belief MDP and the limiting deterministic B\uchi automaton (LDBA) of the temporal logic constraint, we transform the time-dependent safety constraint on the POMDP into a state-dependent constraint on the product belief MDP. Finally, we learn the optimal policy by value iteration under the state-dependent constraint.
ROMar 26, 2021
Model-Free Learning of Safe yet Effective ControllersAlper Kamil Bozkurt, Yu Wang, Miroslav Pajic
We study the problem of learning safe control policies that are also effective; i.e., maximizing the probability of satisfying a linear temporal logic (LTL) specification of a task, and the discounted reward capturing the (classic) control performance. We consider unknown environments modeled as Markov decision processes. We propose a model-free reinforcement learning algorithm that learns a policy that first maximizes the probability of ensuring safety, then the probability of satisfying the given LTL specification and lastly, the sum of discounted Quality of Control rewards. Finally, we illustrate applicability of our RL-based approach.
CRMar 10, 2021
Learning-Based Vulnerability Analysis of Cyber-Physical SystemsAmir Khazraei, Spencer Hallyburton, Qitong Gao et al.
This work focuses on the use of deep learning for vulnerability analysis of cyber-physical systems (CPS). Specifically, we consider a control architecture widely used in CPS (e.g., robotics), where the low-level control is based on e.g., the extended Kalman filter (EKF) and an anomaly detector. To facilitate analyzing the impact potential sensing attacks could have, our objective is to develop learning-enabled attack generators capable of designing stealthy attacks that maximally degrade system operation. We show how such problem can be cast within a learning-based grey-box framework where parts of the runtime information are known to the attacker, and introduce two models based on feed-forward neural networks (FNN); both models are trained offline, using a cost function that combines the attack effects on the estimation error and the residual signal used for anomaly detection, so that the trained models are capable of recursively generating such effective sensor attacks in real-time. The effectiveness of the proposed methods is illustrated on several case studies.
SYMar 8, 2021
Formal Verification of Stochastic Systems with ReLU Neural Network ControllersShiqi Sun, Yan Zhang, Xusheng Luo et al.
In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussian noise, which we abstract by a suitable graph. Then, we formulate a Satisfiability Modulo Convex (SMC) problem to estimate upper bounds on the transition probabilities between nodes in the graph. Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes. Additionally, using the proposed SMC formula, we devise a heuristic method to refine the abstraction of the system in order to further improve the estimated safety bounds. Finally, we corroborate the efficacy of the proposed method with simulation results considering a robot navigation example and comparison against a state-of-the-art verification scheme.
AIFeb 8, 2021
Learning Optimal Strategies for Temporal Tasks in Stochastic GamesAlper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos et al.
Synthesis from linear temporal logic (LTL) specifications provides assured controllers for systems operating in stochastic and potentially adversarial environments. Automatic synthesis tools, however, require a model of the environment to construct controllers. In this work, we introduce a model-free reinforcement learning (RL) approach to derive controllers from given LTL specifications even when the environment is completely unknown. We model the problem as a stochastic game (SG) between the controller and the adversarial environment; we then learn optimal control strategies that maximize the probability of satisfying the LTL specifications against the worst-case environment behavior. We first construct a product game using the deterministic parity automaton (DPA) translated from the given LTL specification. By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this enables the use of model-free RL algorithms to learn an optimal controller strategy. To deal with the common scalability problems when the number of sets defining the acceptance condition of the DPA (usually referred as colors), is large, we propose a lazy color generation method where distinct rewards and discount factors are utilized only when needed, and an approximate method where the controller eventually focuses on only one color. In several case studies, we show that our approach is scalable to a wide range of LTL formulas, significantly outperforming existing methods for learning controllers from LTL specifications in SGs.
RONov 3, 2020
Secure Planning Against Stealthy Attacks via Model-Free Reinforcement LearningAlper Kamil Bozkurt, Yu Wang, Miroslav Pajic
We consider the problem of security-aware planning in an unknown stochastic environment, in the presence of attacks on control signals (i.e., actuators) of the robot. We model the attacker as an agent who has the full knowledge of the controller as well as the employed intrusion-detection system and who wants to prevent the controller from performing tasks while staying stealthy. We formulate the problem as a stochastic game between the attacker and the controller and present an approach to express the objective of such an agent and the controller as a combined linear temporal logic (LTL) formula. We then show that the planning problem, described formally as the problem of satisfying an LTL formula in a stochastic game, can be solved via model-free reinforcement learning when the environment is completely unknown. Finally, we illustrate and evaluate our methods on two robotic planning case studies.
ROOct 2, 2020
Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic ObjectivesAlper Kamil Bozkurt, Yu Wang, Michael Zavlanos et al.
We study the problem of synthesizing control strategies for Linear Temporal Logic (LTL) objectives in unknown environments. We model this problem as a turn-based zero-sum stochastic game between the controller and the environment, where the transition probabilities and the model topology are fully unknown. The winning condition for the controller in this game is the satisfaction of the given LTL specification, which can be captured by the acceptance condition of a deterministic Rabin automaton (DRA) directly derived from the LTL specification. We introduce a model-free reinforcement learning (RL) methodology to find a strategy that maximizes the probability of satisfying a given LTL specification when the Rabin condition of the derived DRA has a single accepting pair. We then generalize this approach to LTL formulas for which the Rabin condition has a larger number of accepting pairs, providing a lower bound on the satisfaction probability. Finally, we illustrate applicability of our RL method on two motion planning case studies.
PRJun 15, 2020
Learning Expected Reward for Switched Linear Control Systems: A Non-Asymptotic ViewMuhammad Abdullah Naeem, Miroslav Pajic
In this work, we show existence of invariant ergodic measure for switched linear dynamical systems (SLDSs) under a norm-stability assumption of system dynamics in some unbounded subset of $\mathbb{R}^{n}$. Consequently, given a stationary Markov control policy, we derive non-asymptotic bounds for learning expected reward (w.r.t the invariant ergodic measure our closed-loop system mixes to) from time-averages using Birkhoff's Ergodic Theorem. The presented results provide a foundation for deriving non-asymptotic analysis for average reward-based optimal control of SLDSs. Finally, we illustrate the presented theoretical results in two case-studies.
LGJun 11, 2020
Learning Monotone Dynamics by Neural NetworksYu Wang, Qitong Gao, Miroslav Pajic
Feed-forward neural networks (FNNs) work as standard building blocks in applying artificial intelligence (AI) to the physical world. They allow learning the dynamics of unknown physical systems (e.g., biological and chemical) {to predict their future behavior}. However, they are likely to violate the physical constraints of those systems without proper treatment. This work focuses on imposing two important physical constraints: monotonicity (i.e., a partial order of system states is preserved over time) and stability (i.e., the system states converge over time) when using FNNs to learn physical dynamics. For monotonicity constraints, we propose to use nonnegative neural networks and batch normalization. For both monotonicity and stability constraints, we propose to learn the system dynamics and corresponding Lyapunov function simultaneously. As demonstrated by case studies, our methods can preserve the stability and monotonicity of FNNs and significantly reduce their prediction errors.
RONov 26, 2019
Hyperproperties for Robotics: Planning via HyperLTLYu Wang, Siddhartha Nalluri, Miroslav Pajic
There is a growing interest on formal methods-based robotic planning for temporal logic objectives. In this work, we extend the scope of existing synthesis methods to hyper-temporal logics. We are motivated by the fact that important planning objectives, such as optimality, robustness, and privacy, (maybe implicitly) involve the interrelation between multiple paths. Such objectives are thus hyperproperties, and cannot be expressed with usual temporal logics like the linear temporal logic (LTL). We show that such hyperproperties can be expressed by HyperLTL, an extension of LTL to multiple paths. To handle the complexity of planning with HyperLTL specifications, we introduce a symbolic approach for synthesizing planning strategies on discrete transition systems. Our planning method is evaluated on several case studies.
ROSep 16, 2019
Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement LearningAlper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos et al.
We present a reinforcement learning (RL) framework to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as a Markov Decision Process (MDP). Specifically, we learn a policy that maximizes the probability of satisfying the LTL formula without learning the transition probabilities. We introduce a novel rewarding and path-dependent discounting mechanism based on the LTL formula such that (i) an optimal policy maximizing the total discounted reward effectively maximizes the probabilities of satisfying LTL objectives, and (ii) a model-free RL algorithm using these rewards and discount factors is guaranteed to converge to such policy. Finally, we illustrate the applicability of our RL-based synthesis approach on two motion planning case studies.
OCMar 25, 2019
An Optimal Graph-Search Method for Secure State EstimationXusheng Luo, Miroslav Pajic, Michael M. Zavlanos
The growing complexity of modern Cyber-Physical Systems (CPS) and the frequent communication between their components make them vulnerable to malicious attacks. As a result, secure state estimation is a critical requirement for the control of these systems. Many existing secure state estimation methods suffer from combinatorial complexity which grows with the number of states and sensors in the system. This complexity can be mitigated using optimization-based methods that relax the original state estimation problem, although at the cost of optimality as these methods often identify attack-free sensors as attacked. In this paper, we propose a new optimal graph-search algorithm to correctly identify malicious attacks and to securely estimate the states even in large-scale CPS modeled as linear time-invariant systems. The graph consists of layers, each one containing two nodes capturing a truth assignment of any given sensor, and directed edges connecting adjacent layers only. Then, our algorithm searches the layers of this graph incrementally, favoring directions at higher layers with more attack-free assignments, while actively managing a repository of nodes to be expanded at later iterations. The proposed search bias and the ability to revisit nodes in the repository and self-correct, allow our graph-search algorithm to reach the optimal assignment faster and tackle larger problems. We show that our algorithm is complete and optimal provided that process and measurement noises do not dominate the attack signal. Moreover, we provide numerical simulations that demonstrate the ability of our algorithm to correctly identify attacked sensors and securely reconstruct the state. Our simulations show that our method outperforms existing algorithms both in terms of optimality and execution time.
GTFeb 12, 2019
Security-Aware Synthesis Using Delayed-Action GamesMahmoud Elfar, Yu Wang, Miroslav Pajic
Stochastic multiplayer games (SMGs) have gained attention in the field of strategy synthesis for multi-agent reactive systems. However, standard SMGs are limited to modeling systems where all agents have full knowledge of the state of the game. In this paper, we introduce delayed-action games (DAGs) formalism that simulates hidden-information games (HIGs) as SMGs, where hidden information is captured by delaying a player's actions. The elimination of private variables enables the usage of SMG off-the-shelf model checkers to implement HIGs. Furthermore, we demonstrate how a DAG can be decomposed into subgames that can be independently explored, utilizing parallel computation to reduce the model checking time, while alleviating the state space explosion problem that SMGs are notorious for. In addition, we propose a DAG-based framework for strategy synthesis and analysis. Finally, we demonstrate applicability of the DAG-based synthesis framework on a case study of a human-on-the-loop unmanned-aerial vehicle system under stealthy attacks, where the proposed framework is used to formally model, analyze and synthesize security-aware strategies for the system.
LOFeb 11, 2019
Statistical Model Checking for HyperpropertiesYu Wang, Siddhartha Nalluri, Borzoo Bonakdarpour et al.
Hyperproperties have shown to be a powerful tool for expressing and reasoning about information-flow security policies. In this paper, we investigate the problem of statistical model checking (SMC) for hyperproperties. Unlike exhaustive model checking, SMC works based on drawing samples from the system at hand and evaluate the specification with statistical confidence. The main benefit of applying SMC over exhaustive techniques is its efficiency and scalability. To reason about probabilistic hyperproperties, we first propose the temporal logic HyperPCLT* that extends PCTL* and HyperPCTL. We show that HyperPCLT* can express important probabilistic information-flow security policies that cannot be expressed with HyperPCTL. Then, we introduce SMC algorithms for verifying HyperPCLT* formulas on discrete-time Markov chains, based on sequential probability ratio tests (SPRT) with a new notion of multi-dimensional indifference region. Our SMC algorithms can handle both non-nested and nested probability operators for any desired significance level. To show the effectiveness of our technique, we evaluate our SMC algorithms on four case studies focused on information security: timing side-channel vulnerability in encryption, probabilistic anonymity in dining cryptographers, probabilistic noninterference of parallel programs, and the performance of a randomized cache replacement policy that acts as a countermeasure against cache flush attacks.
SENov 8, 2018
Integrating Security in Resource-Constrained Cyber-Physical SystemsVuk Lesi, Ilija Jovanov, Miroslav Pajic
Defense mechanisms against network-level attacks are commonly based on the use of cryptographic techniques, such as message authentication codes that provide data integrity guarantees. However, such mechanisms require significant resources, which prevents their continuous use in resource-constrained cyber-physical systems. Recently, it was shown how physical properties of plants can be exploited to relax these requirements for systems where sensor measurements and actuator commands are transmitted over a compromised network; specifically, intermittent use of data authentication, can still provide Quality-of-Control (QoC) guarantees even in the presence of false-data injection attacks. Consequently, in this work we focus on integrating security into existing systems, in order to protect against these attacks. We introduce a design-time methodology that incorporates requirements for QoC in the presence of attacks into end-to-end timing constraints for real-time control transactions, which include data acquisition and authentication, communication, and control. This allows us to formulate a mixed integer linear programming-based method for synthesis of schedulable task and message parameters (i.e., deadlines and offsets) that maintain timing requirements of deployed controllers, while adding a sufficient level of protection against attacks; specifically, this method provides suitable intermittent authentication policies that ensure the desired QoC levels under attack. To additionally reduce the security-related bandwidth overhead, we propose the use of cumulative message authentication. Furthermore, we introduce a method for opportunistic use of remaining resources to further improve the overall QoC guarantees while ensuring system schedulability. Finally, we demonstrate applicability of our methodology on synthetic automotive systems as well as an automotive case-study.
OCJul 10, 2017
Relaxing Integrity Requirements for Attack-Resilient Cyber-Physical SystemsIlija Jovanov, Miroslav Pajic
The increase in network connectivity has also resulted in several high-profile attacks on cyber-physical systems. An attacker that manages to access a local network could remotely affect control performance by tampering with sensor measurements delivered to the controller. Recent results have shown that with network-based attacks, such as Man-in-the-Middle attacks, the attacker can introduce an unbounded state estimation error if measurements from a suitable subset of sensors contain false data when delivered to the controller. While these attacks can be addressed with the standard cryptographic tools that ensure data integrity, their continuous use would introduce significant communication and computation overhead. Consequently, we study effects of intermittent data integrity guarantees on system performance under stealthy attacks. We consider linear estimators equipped with a general type of residual-based intrusion detectors (including $χ^2$ and SPRT detectors), and show that even when integrity of sensor measurements is enforced only intermittently, the attack impact is significantly limited; specifically, the state estimation error is bounded or the attacker cannot remain stealthy. Furthermore, we present methods to: (1) evaluate the effects of any given integrity enforcement policy in terms of reachable state-estimation errors for any type of stealthy attacks, and (2) design an enforcement policy that provides the desired estimation error guarantees under attack. Finally, on three automotive case studies we show that even with less than 10% of authenticated messages we can ensure satisfiable control performance in the presence of attacks.
CRMay 29, 2016
Coding Schemes for Securing Cyber-Physical Systems Against Stealthy Data Injection AttacksFei Miao, Quanyan Zhu, Miroslav Pajic et al.
This paper considers a method of coding the sensor outputs in order to detect stealthy false data injection attacks. An intelligent attacker can design a sequence of data injection to sensors and actuators that pass the state estimator and statistical fault detector, based on knowledge of the system parameters. To stay undetected, the injected data should increase the state estimation errors while keep the estimation residues small. We employ a coding matrix to change the original sensor outputs to increase the estimation residues under intelligent data injection attacks. This is a low cost method compared with encryption schemes over all sensor measurements in communication networks. We show the conditions of a feasible coding matrix under the assumption that the attacker does not have knowledge of the exact coding matrix. An algorithm is developed to compute a feasible coding matrix, and, we show that in general, multiple feasible coding matrices exist. To defend against attackers who estimates the coding matrix via sensor and actuator measurements, time-varying coding matrices are designed according to the detection requirements. A heuristic algorithm to decide the time length of updating a coding matrix is then proposed.