Andrew Clark

h-index22

15papers

239citations

Novelty50%

AI Score29

Ranked #144,834 of 194,257 authors (top 75%)#841 in SY (top 51%)

15 Papers

17.5LGOct 13, 2023Code

Exact Verification of ReLU Neural Control Barrier Functions

Hongchao Zhang, Junlin Wu, Yevgeniy Vorobeychik et al.

Control Barrier Functions (CBFs) are a popular approach for safe control of nonlinear systems. In CBF-based control, the desired safety properties of the system are mapped to nonnegativity of a CBF, and the control input is chosen to ensure that the CBF remains nonnegative for all time. Recently, machine learning methods that represent CBFs as neural networks (neural control barrier functions, or NCBFs) have shown great promise due to the universal representability of neural networks. However, verifying that a learned CBF guarantees safety remains a challenging research problem. This paper presents novel exact conditions and algorithms for verifying safety of feedforward NCBFs with ReLU activation functions. The key challenge in doing so is that, due to the piecewise linearity of the ReLU function, the NCBF will be nondifferentiable at certain points, thus invalidating traditional safety verification methods that assume a smooth barrier function. We resolve this issue by leveraging a generalization of Nagumo's theorem for proving invariance of sets with nonsmooth boundaries to derive necessary and sufficient conditions for safety. Based on this condition, we propose an algorithm for safety verification of NCBFs that first decomposes the NCBF into piecewise linear segments and then solves a nonlinear program to verify safety of each segment as well as the intersections of the linear segments. We mitigate the complexity by only considering the boundary of the safe region and by pruning the segments with Interval Bound Propagation (IBP) and linear relaxation. We evaluate our approach through numerical studies with comparison to state-of-the-art SMT-based methods. Our code is available at https://github.com/HongchaoZhang-HZ/exactverif-reluncbf-nips23.

1.2SYJan 19, 2017

Combinatorial Algorithms for Control of Biological Regulatory Networks

Andrew Clark, Phillip Lee, Basel Alomair et al.

Biological processes, including cell differentiation, organism development, and disease progression, can be interpreted as attractors (fixed points or limit cycles) of an underlying networked dynamical system. In this paper, we study the problem of computing a minimum-size subset of control nodes that can be used to steer a given biological network towards a desired attractor, when the networked system has Boolean dynamics. We first prove that this problem cannot be approximated to any nontrivial factor unless P=NP. We then formulate a sufficient condition and prove that the sufficient condition is equivalent to a target set selection problem, which can be solved using integer linear programming. Furthermore, we show that structural properties of biological networks can be exploited to reduce the computational complexity. We prove that when the network nodes have threshold dynamics and certain topological structures, such as block cactus topology and hierarchical organization, the input selection problem can be solved or approximated in polynomial time. For networks with nested canalyzing dynamics, we propose polynomial-time algorithms that are within a polylogarithmic bound of the global optimum. We validate our approach through numerical study on real-world gene regulatory networks.

1.2SYJan 10, 2019

On the Structure and Computation of Random Walk Times in Finite Graphs

Andrew Clark, Basel Alomair, Linda Bushnell et al.

We consider random walks in which the walk originates in one set of nodes and then continues until it reaches one or more nodes in a target set. The time required for the walk to reach the target set is of interest in understanding the convergence of Markov processes, as well as applications in control, machine learning, and social sciences. In this paper, we investigate the computational structure of the random walk times as a function of the set of target nodes, and find that the commute, hitting, and cover times all exhibit submodular structure, even in non-stationary random walks. We provide a unifying proof of this structure by considering each of these times as special cases of stopping times. We generalize our framework to walks in which the transition probabilities and target sets are jointly chosen to minimize the travel times, leading to polynomial-time approximation algorithms for choosing target sets. Our results are validated through numerical study.

1.2SYMay 31, 2016

Submodularity in Input Node Selection for Networked Systems

Andrew Clark, Basel Alomair, Linda Bushnell et al.

Networked systems are systems of interconnected components, in which the dynamics of each component are influenced by the behavior of neighboring components. Examples of networked systems include biological networks, critical infrastructures such as power grids, transportation systems, and the Internet, and social networks. The growing importance of such systems has led to an interest in control of networks to ensure performance, stability, robustness, and resilience. A widely-studied method for controlling networked systems is to directly control a subset of input nodes, which then steer the remaining nodes to their desired states. This article presents submodular optimization approaches for input node selection in networked systems. Submodularity is a property of set functions that enables the development of computationally tractable algorithms with provable optimality bounds. For a variety of physically relevant systems, the physical dynamics have submodular structures that can be exploited to develop efficient input selection algorithms. This article will describe these structures and the resulting algorithms, as well as discuss open problems.

4.1ROFeb 28, 2024Code

Fault Tolerant Neural Control Barrier Functions for Robotic Systems under Sensor Faults and Attacks

Hongchao Zhang, Luyao Niu, Andrew Clark et al.

Safety is a fundamental requirement of many robotic systems. Control barrier function (CBF)-based approaches have been proposed to guarantee the safety of robotic systems. However, the effectiveness of these approaches highly relies on the choice of CBFs. Inspired by the universal approximation power of neural networks, there is a growing trend toward representing CBFs using neural networks, leading to the notion of neural CBFs (NCBFs). Current NCBFs, however, are trained and deployed in benign environments, making them ineffective for scenarios where robotic systems experience sensor faults and attacks. In this paper, we study safety-critical control synthesis for robotic systems under sensor faults and attacks. Our main contribution is the development and synthesis of a new class of CBFs that we term fault tolerant neural control barrier function (FT-NCBF). We derive the necessary and sufficient conditions for FT-NCBFs to guarantee safety, and develop a data-driven method to learn FT-NCBFs by minimizing a loss function constructed using the derived conditions. Using the learned FT-NCBF, we synthesize a control input and formally prove the safety guarantee provided by our approach. We demonstrate our proposed approach using two case studies: obstacle avoidance problem for an autonomous mobile robot and spacecraft rendezvous problem, with code available via https://github.com/HongchaoZhang-HZ/FTNCBF.

6.5LGMar 29, 2021

Reinforcement Learning Beyond Expectation

Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark et al.

The inputs and preferences of human users are important considerations in situations where these users interact with autonomous cyber or cyber-physical systems. In these scenarios, one is often interested in aligning behaviors of the system with the preferences of one or more human users. Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently. In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment. In traditional reinforcement learning, these behaviors are learned through repeated interactions with the environment by optimizing an expected utility. In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost. We introduce the notion of the CPT-value of an action taken in a state, and establish the convergence of an iterative dynamic programming-based approach to estimate this quantity. We develop two algorithms to enable agents to learn policies to optimize the CPT-vale, and evaluate these algorithms in environments where a target state has to be reached while avoiding obstacles. We demonstrate that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.

3.3SYJul 27, 2020

Privacy-Preserving Resilience of Cyber-Physical Systems to Adversaries

Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark et al.

A cyber-physical system (CPS) is expected to be resilient to more than one type of adversary. In this paper, we consider a CPS that has to satisfy a linear temporal logic (LTL) objective in the presence of two kinds of adversaries. The first adversary has the ability to tamper with inputs to the CPS to influence satisfaction of the LTL objective. The interaction of the CPS with this adversary is modeled as a stochastic game. We synthesize a controller for the CPS to maximize the probability of satisfying the LTL objective under any policy of this adversary. The second adversary is an eavesdropper who can observe labeled trajectories of the CPS generated from the previous step. It could then use this information to launch other kinds of attacks. A labeled trajectory is a sequence of labels, where a label is associated to a state and is linked to the satisfaction of the LTL objective at that state. We use differential privacy to quantify the indistinguishability between states that are related to each other when the eavesdropper sees a labeled trajectory. Two trajectories of equal length will be differentially private if they are differentially private at each state along the respective trajectories. We use a skewed Kantorovich metric to compute distances between probability distributions over states resulting from actions chosen according to policies from related states in order to quantify differential privacy. Moreover, we do this in a manner that does not affect the satisfaction probability of the LTL objective. We validate our approach on a simulation of a UAV that has to satisfy an LTL objective in an adversarial environment.

18.5AIJan 19, 2020

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

Baicen Xiao, Qifan Lu, Bhaskar Ramasubramanian et al.

Reinforcement learning has been successful in training autonomous agents to accomplish goals in complex environments. Although this has been adapted to multiple settings, including robotics and computer games, human players often find it easier to obtain higher rewards in some environments than reinforcement learning algorithms. This is especially true of high-dimensional state spaces where the reward obtained by the agent is sparse or extremely delayed. In this paper, we seek to effectively integrate feedback signals supplied by a human operator with deep reinforcement learning algorithms in high-dimensional state spaces. We call this FRESH (Feedback-based REward SHaping). During training, a human operator is presented with trajectories from a replay buffer and then provides feedback on states and actions in the trajectory. In order to generalize feedback signals provided by the human operator to previously unseen states and actions at test-time, we use a feedback neural network. We use an ensemble of neural networks with a shared network architecture to represent model uncertainty and the confidence of the neural network in its output. The output of the feedback neural network is converted to a shaping reward that is augmented to the reward provided by the environment. We evaluate our approach on the Bowling and Skiing Atari games in the arcade learning environment. Although human experts have been able to achieve high scores in these environments, state-of-the-art deep learning algorithms perform poorly. We observe that FRESH is able to achieve much higher scores than state-of-the-art deep learning algorithms in both environments. FRESH also achieves a 21.4% higher score than a human expert in Bowling and does as well as a human expert in Skiing.

5.4LGJul 20, 2019

Potential-Based Advice for Stochastic Policy Learning

Baicen Xiao, Bhaskar Ramasubramanian, Andrew Clark et al.

This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies. We show that a potential-based reward shaping scheme is able to preserve optimality of stochastic policies, and demonstrate that the ability of an agent to learn an optimal policy is not affected when this scheme is augmented to soft Q-learning. We propose a method to impart potential based advice schemes to policy gradient algorithms. An algorithm that considers an advantage actor-critic architecture augmented with this scheme is proposed, and we give guarantees on its convergence. Finally, we evaluate our approach on a puddle-jump grid world with indistinguishable states, and the continuous state and action mountain car environment from classical control. Our results indicate that these schemes allow the agent to learn a stochastic optimal policy faster and obtain a higher average reward.

1.2CYJun 4, 2019

A Differentially Private Incentive Design for Traffic Offload to Public Transportationx

Luyao Niu, Andrew Clark

Increasingly large trip demands have strained urban transportation capacity, which consequently leads to traffic congestion and rapid growth of greenhouse gas emissions. In this work, we focus on achieving sustainable transportation by incentivizing passengers to switch from private cars to public transport. We address the following challenges. First, the passengers incur inconvenience costs when changing their transit behaviors due to delay and discomfort, and thus need to be reimbursed. Second, the inconvenience cost, however, is unknown to the government when choosing the incentives. Furthermore, changing transit behaviors raises privacy concerns from passengers. An adversary could infer personal information, (e.g., daily routine, region of interest, and wealth), by observing the decisions made by the government, which are known to the public. We adopt the concept of differential privacy and propose privacy-preserving incentive designs under two settings, denoted as two-way communication and one-way communication. Under two-way communication, passengers submit bids and then the government determines the incentives, whereas in one-way communication the government simply sets a price without acquiring information from the passengers. Under one-way communication, we focus on how the government should design the incentives without revealing passengers' inconvenience costs while still preserving differential privacy. We formulate the problem as a convex program, and propose a differentially private and near-optimal solution algorithm. A numerical case study using Caltrans Performance Measurement System (PeMS) data source is presented as evaluation. The results show that the proposed approaches achieve a win-win situation in which both the government and passengers obtain non-negative utilities.

8.5CRJul 25, 2018

Shape of the Cloak: Formal Analysis of Clock Skew-Based Intrusion Detection System in Controller Area Networks

Xuhang Ying, Sang Uk Sagong, Andrew Clark et al.

This paper presents a new masquerade attack called the cloaking attack and provides formal analyses for clock skew-based Intrusion Detection Systems (IDSs) that detect masquerade attacks in the Controller Area Network (CAN) in automobiles. In the cloaking attack, the adversary manipulates the message inter-transmission times of spoofed messages by adding delays so as to emulate a desired clock skew and avoid detection. In order to predict and characterize the impact of the cloaking attack in terms of the attack success probability on a given CAN bus and IDS, we develop formal models for two clock skew-based IDSs, i.e., the state-of-the-art (SOTA) IDS and its adaptation to the widely used Network Time Protocol (NTP), using parameters of the attacker, the detector, and the hardware platform. To the best of our knowledge, this is the first paper that provides formal analyses of clock skew-based IDSs in automotive CAN. We implement the cloaking attack on two hardware testbeds, a prototype and a real vehicle (the University of Washington (UW) EcoCAR), and demonstrate its effectiveness against both the SOTA and NTP-based IDSs. We validate our formal analyses through extensive experiments for different messages, IDS settings, and vehicles. By comparing each predicted attack success probability curve against its experimental curve, we find that the average prediction error is within 3.0% for the SOTA IDS and 5.7% for the NTP-based IDS.

18.0CROct 7, 2017

Cloaking the Clock: Emulating Clock Skew in Controller Area Networks

Sang Uk Sagong, Xuhang Ying, Andrew Clark et al.

Automobiles are equipped with Electronic Control Units (ECU) that communicate via in-vehicle network protocol standards such as Controller Area Network (CAN). These protocols are designed under the assumption that separating in-vehicle communications from external networks is sufficient for protection against cyber attacks. This assumption, however, has been shown to be invalid by recent attacks in which adversaries were able to infiltrate the in-vehicle network. Motivated by these attacks, intrusion detection systems (IDSs) have been proposed for in-vehicle networks that attempt to detect attacks by making use of device fingerprinting using properties such as clock skew of an ECU. In this paper, we propose the cloaking attack, an intelligent masquerade attack in which an adversary modifies the timing of transmitted messages in order to match the clock skew of a targeted ECU. The attack leverages the fact that, while the clock skew is a physical property of each ECU that cannot be changed by the adversary, the estimation of the clock skew by other ECUs is based on network traffic, which, being a cyber component only, can be modified by an adversary. We implement the proposed cloaking attack and test it on two IDSs, namely, the current state-of-the-art IDS and a new IDS that we develop based on the widely-used Network Time Protocol (NTP). We implement the cloaking attack on two hardware testbeds, a prototype and a real connected vehicle, and show that it can always deceive both IDSs. We also introduce a new metric called the Maximum Slackness Index to quantify the effectiveness of the cloaking attack even when the adversary is unable to precisely match the clock skew of the targeted ECU.

1.2SYOct 30, 2015

Global Practical Node and Edge Synchronization in Kuramoto Networks: A Submodular Optimization Framework

Andrew Clark, Basel Alomair, Linda Bushnell et al.

Synchronization underlies phenomena including memory and perception in the brain, coordinated motion of animal flocks, and stability of the power grid. These synchronization phenomena are often modeled through networks of phase-coupled oscillating nodes. Heterogeneity in the node dynamics, however, may prevent such networks from achieving the required level of synchronization. In order to guarantee synchronization, external inputs can be used to pin a subset of nodes to a reference frequency, while the remaining nodes are steered toward synchronization via local coupling. In this paper, we present a submodular optimization framework for selecting a set of nodes to act as external inputs in order to achieve synchronization from almost any initial network state. We derive threshold-based sufficient conditions for synchronization, and then prove that these conditions are equivalent to connectivity of a class of augmented network graphs. Based on this connection, we map the sufficient conditions for synchronization to constraints on submodular functions, leading to efficient algorithms with provable optimality bounds for selecting input nodes. We illustrate our approach via numerical studies of synchronization in networks from power systems, wireless networks, and neuronal networks.

2.3SIJul 24, 2012

SODEXO: A System Framework for Deployment and Exploitation of Deceptive Honeybots in Social Networks

Quanyan Zhu, Andrew Clark, Radha Poovendran et al.

As social networking sites such as Facebook and Twitter are becoming increasingly popular, a growing number of malicious attacks, such as phishing and malware, are exploiting them. Among these attacks, social botnets have sophisticated infrastructure that leverages compromised users accounts, known as bots, to automate the creation of new social networking accounts for spamming and malware propagation. Traditional defense mechanisms are often passive and reactive to non-zero-day attacks. In this paper, we adopt a proactive approach for enhancing security in social networks by infiltrating botnets with honeybots. We propose an integrated system named SODEXO which can be interfaced with social networking sites for creating deceptive honeybots and leveraging them for gaining information from botnets. We establish a Stackelberg game framework to capture strategic interactions between honeybots and botnets, and use quantitative methods to understand the tradeoffs of honeybots for their deployment and exploitation in social networks. We design a protection and alert system that integrates both microscopic and macroscopic models of honeybots and optimally determines the security strategies for honeybots. We corroborate the proposed mechanism with extensive simulations and comparisons with passive defenses.

2.5NEJun 28, 2012

Piecewise Linear Topology, Evolutionary Algorithms, and Optimization Problems

Andrew Clark

Schemata theory, Markov chains, and statistical mechanics have been used to explain how evolutionary algorithms (EAs) work. Incremental success has been achieved with all of these methods, but each has been stymied by limitations related to its less-than-global view. We show that moving the investigation into topological space improves our understanding of why EAs work.