Claire J. Tomlin

SY
h-index54
75papers
4,066citations
Novelty52%
AI Score59

75 Papers

SYJul 4, 2023
Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers

Andreas Doering, Marius Wiggert, Hanna Krasowski et al. · mit

Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they might end up in currents that inevitably push them into unsafe areas such as shallow areas, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for free-floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability (HJ-MTR) to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy closed-loop guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. We demonstrate the safety of our approach in such realistic situations empirically with large-scale simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning on new forecasts can ensure safety with high probability even under forecast errors that exceed the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination.

SYSep 21, 2017
Hamilton-Jacobi Reachability: A Brief Overview and Recent Advances

Somil Bansal, Mo Chen, Sylvia Herbert et al.

Hamilton-Jacobi (HJ) reachability analysis is an important formal verification method for guaranteeing performance and safety properties of dynamical systems; it has been applied to many small-scale systems in the past decade. Its advantages include compatibility with general nonlinear system dynamics, formal treatment of bounded disturbances, and the availability of well-developed numerical tools. The main challenge is addressing its exponential computational complexity with respect to the number of state variables. In this tutorial, we present an overview of basic HJ reachability theory and provide instructions for using the most recent numerical tools, including an efficient GPU-parallelized implementation of a Level Set Toolbox for computing reachable sets. In addition, we review some of the current work in high-dimensional HJ reachability to show how the dimensionality challenge can be alleviated via various general theoretical and application-specific insights.

SYJul 4, 2023
Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents

Matthias Killer, Marius Wiggert, Hanna Krasowski et al. · mit

Seaweed biomass presents a substantial opportunity for climate mitigation, yet to realize its potential, farming must be expanded to the vast open oceans. However, in the open ocean neither anchored farming nor floating farms with powerful engines are economically viable. Thus, a potential solution are farms that operate by going with the flow, utilizing minimal propulsion to strategically leverage beneficial ocean currents. In this work, we focus on low-power autonomous seaweed farms and design controllers that maximize seaweed growth by taking advantage of ocean currents. We first introduce a Dynamic Programming (DP) formulation to solve for the growth-optimal value function when the true currents are known. However, in reality only short-term imperfect forecasts with increasing uncertainty are available. Hence, we present three additional extensions. Firstly, we use frequent replanning to mitigate forecast errors. Second, to optimize for long-term growth, we extend the value function beyond the forecast horizon by estimating the expected future growth based on seasonal average currents. Lastly, we introduce a discounted finite-time DP formulation to account for the increasing uncertainty in future ocean current estimates. We empirically evaluate our approach with 30-day simulations of farms in realistic ocean conditions. Our method achieves 95.8\% of the best possible growth using only 5-day forecasts.This demonstrates that low-power propulsion is a promising method to operate autonomous seaweed farms in real-world conditions.

SYMar 6, 2018
Planning, Fast and Slow: A Framework for Adaptive Real-Time Safe Trajectory Planning

David Fridovich-Keil, Sylvia L. Herbert, Jaime F. Fisac et al.

Motion planning is an extremely well-studied problem in the robotics community, yet existing work largely falls into one of two categories: computationally efficient but with few if any safety guarantees, or able to give stronger guarantees but at high computational cost. This work builds on a recent development called FaSTrack in which a slow offline computation provides a modular safety guarantee for a faster online planner. We introduce the notion of "meta-planning" in which a refined offline computation enables safe switching between different online planners. This provides autonomous systems with the ability to adapt motion plans to a priori unknown environments in real-time as sensor measurements detect new obstacles, and the flexibility to maneuver differently in the presence of obstacles than they would in free space, all while maintaining a strict safety guarantee. We demonstrate the meta-planning algorithm both in simulation and in hardware using a small Crazyflie 2.0 quadrotor.

SYJun 3
Characterization and Analysis of Emergency Landing Flight Envelopes with Graded Safety Specifications

Chams Eddine Mballo, Bryce L. Ferguson, Inkyu Jang et al.

Emergency landing flight envelope analysis traditionally adopts a binary notion of safety, whereby a trajectory is safe only if state constraints are satisfied pointwise in time. In practice, ensuring a successful landing requires recognizing that aircraft operation spans a continuum in the state space from the nominal to the critical regime. Between these regimes lies a degraded regime of states outside nominal operation that may be visited only for limited durations. Safety is therefore inherently graded, in the sense that limited exposure to degraded states may be tolerated, and must be assessed using a trajectory-dependent criterion rather than a purely pointwise-in-time one. This paper develops a Hamilton-Jacobi reachability framework for analyzing emergency landing flight envelopes under this graded notion of safety. Safety is encoded through a soft constraint defined by a designer-specified continuous violation cost function that assigns zero cost in the nominal regime and larger cost to more safety-critical off-nominal states. We introduce a general class of state- and time-dependent violation cost functions and establish monotonicity and continuity properties that characterize how the flight envelope varies with the cost of off-nominal operation. These results provide a principled sensitivity analysis linking safety conservativeness to operational capability. Building on this analysis, we propose a synthesis algorithm for parameterized violation cost functions in this class. The algorithm provably converges to the least conservative parameter under which a prescribed off-nominal safety requirement is satisfied. Numerical results for a fixed-wing emergency landing scenario under propulsion failure demonstrate the sensitivity properties and validate the algorithm.

MAMar 21, 2016
Safe Sequential Path Planning of Multi-Vehicle Systems via Double-Obstacle Hamilton-Jacobi-Isaacs Variational Inequality

Mo Chen, Jaime F. Fisac, Shankar Sastry et al.

We consider the problem of planning trajectories for a group of $N$ vehicles, each aiming to reach its own target set while avoiding danger zones of other vehicles. The analysis of problems like this is extremely important practically, especially given the growing interest in utilizing unmanned aircraft systems for civil purposes. The direct solution of this problem by solving a single-obstacle Hamilton-Jacobi-Isaacs (HJI) variational inequality (VI) is numerically intractable due to the exponential scaling of computation complexity with problem dimensionality. Furthermore, the single-obstacle HJI VI cannot directly handle situations in which vehicles do not have a common scheduled arrival time. Instead, we perform sequential path planning by considering vehicles in order of priority, modeling higher-priority vehicles as time-varying obstacles for lower-priority vehicles. To do this, we solve a double-obstacle HJI VI which allows us to obtain the reach-avoid set, defined as the set of states from which a vehicle can reach its target while staying within a time-varying state constraint set. From the solution of the double-obstacle HJI VI, we can also extract the latest start time and the optimal control for each vehicle. This is a first application of the double-obstacle HJI VI which can handle systems with time-varying dynamics, target sets, and state constraint sets, and results in computation complexity that scales linearly, as opposed to exponentially, with the number of vehicles in consideration.

SYOct 4, 2016
Multi-Vehicle Collision Avoidance via Hamilton-Jacobi Reachability and Mixed Integer Programming

Mo Chen, Jennifer C. Shih, Claire J. Tomlin

Multi-agent differential games are important and useful tools for analyzing many practical problems. With the recent surge of interest in using UAVs for civil purposes, the importance and urgency of developing tractable multi-agent analysis techniques that provide safety and performance guarantees is at an all-time high. Hamilton-Jacobi (HJ) reachability has successfully provided safety guarantees to small-scale systems and is flexible in terms of system dynamics. However, the exponential complexity scaling of HJ reachability prevents its direct application to large scale problems when the number of vehicles is greater than two. In this paper, we overcome the scalability limitations of HJ reachability by using a mixed integer program that exploits the properties of HJ solutions to provide higher-level control logic. Our proposed method provides safety guarantee for three-vehicle systems -- a previously intractable task for HJ reachability -- without incurring significant additional computation cost. Furthermore, our method is scalable beyond three vehicles and performs significantly better by several metrics than an extension of pairwise collision avoidance to multi-vehicle collision avoidance. We demonstrate our proposed method in simulations.

SYMar 21, 2016
Safe Platooning of Unmanned Aerial Vehicles via Reachability

Mo Chen, Qie Hu, Casey Mackin et al.

Recently, there has been immense interest in using unmanned aerial vehicles (UAVs) for civilian operations such as package delivery, firefighting, and fast disaster response. As a result, UAV traffic management systems are needed to support potentially thousands of UAVs flying simultaneously in the airspace, in order to ensure their liveness and safety requirements are met. Hamilton-Jacobi (HJ) reachability is a powerful framework for providing conditions under which these requirements can be met, and for synthesizing the optimal controller for meeting them. However, due to the curse of dimensionality, HJ reachability is only tractable for a small number of vehicles if their set of maneuvers is unrestricted. In this paper, we define a platoon to be a group of UAVs in a single-file formation. We model each vehicle as a hybrid system with modes corresponding to its role in the platoon, and specify the set of allowed maneuvers in each mode to make the analysis tractable. We propose several liveness controllers based on HJ reachability, and wrap a safety controller, also based on HJ reachability, around the liveness controllers. For a single altitude range, our approach guarantees safety for one safety breach; in the unlikely event of multiple safety breaches, safety can be guaranteed over multiple altitude ranges. We demonstrate the satisfaction of liveness and safety requirements through simulations of three common scenarios.

SYMar 18, 2019
Reachability-Based Safety Guarantees using Efficient Initializations

Sylvia L. Herbert, Shromona Ghosh, Somil Bansal et al.

Hamilton-Jacobi-Isaacs (HJI) reachability analysis is a powerful tool for analyzing the safety of autonomous systems. This analysis is computationally intensive and typically performed offline. Online, however, the autonomous system may experience changes in system dynamics, external disturbances, and/or the surrounding environment, requiring updated safety guarantees. Rather than restarting the safety analysis, we propose a method of "warm-start" reachability, which uses a user-defined initialization (typically the previously computed solution). By starting with an HJI function that is closer to the solution than the standard initialization, convergence may take fewer iterations. In this paper we prove that warm-starting will result in guaranteed conservative solutions by over-approximating the states that must be avoided to maintain safety. We additionally prove that for many common problem formulations, warm-starting will result in exact solutions.We demonstrate our method on several illustrative examples with a double integrator, and also on a more practical example with a 10D quadcopter model that experiences changes in mass and disturbances and must update its safety guarantees accordingly. We compare our approach to standard reachability and a recently proposed "discounted" reachability method, and find for our examples that warm-starting is 1.6 times faster than standard and 6.2 times faster than (untuned) discounted reachability.

SYJun 21, 2016
Plug-and-Play Model Predictive Control for Load Shaping and Voltage Control in Smart Grids

Caroline Le Floch, Somil Bansal, Claire J. Tomlin et al.

This paper presents a predictive controller for handling plug-and-play (P&P) charging requests of flexible loads in a distribution system. We define two types of flexible loads: (i) deferrable loads that have a fixed power profile but can be deferred in time and (ii) shapeable loads that have flexible power profiles but fixed energy requests, such as Plug-in Electric Vehicles (PEVs). The proposed method uses a hierarchical control scheme based on a model predictive control (MPC) formulation for minimizing the global system cost. The first stage computes a reachable reference that trades off deviation from the nominal voltage with the required generation control. The second stage uses a price-based objective to aggregate flexible loads and provide load shaping services, while satisfying system constraints and users' preferences at all times. It is shown that the proposed controller is recursively feasible under specific conditions, i.e. the flexible load demands are satisfied and bus voltages remain within the desired limits. Finally, the proposed scheme is illustrated on a 55 bus radial distribution network.

SYJun 9, 2016
Secure Estimation based Kalman Filter for Cyber-Physical Systems against Adversarial Attacks

Young Hwan Chang, Qie Hu, Claire J. Tomlin

Cyber-physical systems are found in many applications such as power networks, manufacturing processes, and air and ground transportation systems. Maintaining security of these systems under cyber attacks is an important and challenging task, since these attacks can be erratic and thus difficult to model. Secure estimation problems study how to estimate the true system states when measurements are corrupted and/or control inputs are compromised by attackers. The authors in [1] proposed a secure estimation method when the set of attacked nodes (sensors, controllers) is fixed. In this paper, we extend these results to scenarios in which the set of attacked nodes can change over time. We formulate this secure estimation problem into the classical error correction problem [2] and we show that accurate decoding can be guaranteed under a certain condition. Furthermore, we propose a combined secure estimation method with our proposed secure estimator and the Kalman Filter for improved practical performance. Finally, we demonstrate the performance of our method through simulations of two scenarios where an unmanned aerial vehicle is under adversarial attack.

AIOct 10, 2022
Optimality Guarantees for Particle Belief Approximation of POMDPs

Michael H. Lim, Tyler J. Becker, Mykel J. Kochenderfer et al.

Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.

SYMar 22, 2016
Building Model Identification during Regular Operation - Empirical Results and Challenges

Qie Hu, Frauke Oldewurtel, Maximilian Balandat et al.

The inter-temporal consumption flexibility of commercial buildings can be harnessed to improve the energy efficiency of buildings, or to provide ancillary service to the power grid. To do so, a predictive model of the building's thermal dynamics is required. In this paper, we identify a physics-based model of a multi-purpose commercial building including its heating, ventilation and air conditioning system during regular operation. We present our empirical results and show that large uncertainties in internal heat gains, due to occupancy and equipment, present several challenges in utilizing the building model for long-term prediction. In addition, we show that by learning these uncertain loads online and dynamically updating the building model, prediction accuracy is improved significantly.

SYMar 18, 2022
Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

Jingqi Li, Donggun Lee, Somayeh Sojoudi et al.

In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a contracting Bellman backup, where the super-zero level set, i.e., the set of states where the value function is evaluated to be non-negative, recovers the reach-avoid set. Building upon this, we prove that the proposed method can be adapted to compute the viability kernel, or the set of states which could be controlled to satisfy given constraints, and the backward reachable set, or the set of states that could be driven towards a given target set. Finally, we propose to alleviate the curse of dimensionality issue in high-dimensional problems by extending Conservative Q-Learning, a deep reinforcement learning technique, to learn a value function such that the super-zero level set of the learned value function serves as a (conservative) approximation to the reach-avoid set. Our theoretical and empirical results suggest that the proposed method could learn reliably the reach-avoid set and the optimal control policy even with neural network approximation.

SYMar 22, 2016
Secure State Estimation for Nonlinear Power Systems under Cyber Attacks

Qie Hu, Dariush Fooladivanda, Young Hwan Chang et al.

This paper focuses on securely estimating the state of a nonlinear dynamical system from a set of corrupted measurements. In particular, we consider two broad classes of nonlinear systems, and propose a technique which enables us to perform secure state estimation for such nonlinear systems. We then provide guarantees on the achievable state estimation error against arbitrary corruptions, and analytically characterize the number of errors that can be perfectly corrected by a decoder. To illustrate how the proposed nonlinear estimation approach can be applied to practical systems, we focus on secure estimation for the wide area control of an interconnected power system under cyber-physical attacks and communication failures, and propose a secure estimator for the power system. Finally, we numerically show that the proposed secure estimation algorithm enables us to reconstruct the attack signals accurately.

SYAug 23, 2022
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Fernando Castañeda, Jason J. Choi, Wonsuhk Jung et al.

Learning-based control has recently shown great efficacy in performing complex tasks for various applications. However, to deploy it in real systems, it is of vital importance to guarantee the system will stay safe. Control Barrier Functions (CBFs) offer mathematical tools for designing safety-preserving controllers for systems with known dynamics. In this article, we first introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers using Gaussian Process (GP) regression to close the gap between an approximate mathematical model and the real system, which results in a second-order cone program (SOCP)-based control design. We then present the pointwise feasibility conditions of the resulting safety controller, highlighting the level of richness that the available system information must meet to ensure safety. We use these conditions to devise an event-triggered online data collection strategy that ensures the recursive feasibility of the learned safety controller. Our method works by constantly reasoning about whether the current information is sufficient to ensure safety or if new measurements under active safe exploration are required to reduce the uncertainty. As a result, our proposed framework can guarantee the forward invariance of the safe set defined by the CBF with high probability, even if it contains a priori unexplored regions. We validate the proposed framework in two numerical simulation experiments.

SYMay 10, 2017
Provably Safe and Robust Drone Routing via Sequential Path Planning: A Case Study in San Francisco and the Bay Area

Mo Chen, Somil Bansal, Ken Tanabe et al.

Provably safe and scalable multi-vehicle path planning is an important and urgent problem due to the expected increase of automation in civilian airspace in the near future. Hamilton-Jacobi (HJ) reachability is an ideal tool for analyzing such safety-critical systems and has been successfully applied to several small-scale problems. However, a direct application of HJ reachability to large scale systems is often intractable because of its exponentially-scaling computation complexity with respect to system dimension, also known as the "curse of dimensionality". To overcome this problem, the sequential path planning (SPP) method, which assigns strict priorities to vehicles, was previously proposed; SPP allows multi-vehicle path planning to be done with a linearly-scaling computation complexity. In this work, we demonstrate the potential of SPP algorithm for large-scale systems. In particular, we simulate large-scale multi-vehicle systems in two different urban environments, a city environment and a multi-city environment, and use the SPP algorithm for trajectory planning. SPP is able to efficiently design collision-free trajectories in both environments despite the presence of disturbances in vehicles' dynamics. To ensure a safe transition of vehicles to their destinations, our method automatically allocates space-time reservations to vehicles while accounting for the magnitude of disturbances such as wind in a provably safe way. Our simulation results show an intuitive multi-lane structure in airspace, where the number of lanes and the distance between the lanes depend on the size of disturbances and other problem parameters.

SYJun 13, 2016
Secure Estimation for Unmanned Aerial Vehicles against Adversarial Cyber Attacks

Qie Hu, Young Hwan Chang, Claire J. Tomlin

In the coming years, usage of Unmanned Aerial Vehicles (UAVs) is expected to grow tremendously. Maintaining security of UAVs under cyber attacks is an important yet challenging task, as these attacks are often erratic and difficult to predict. Secure estimation problems study how to estimate the states of a dynamical system from a set of noisy and maliciously corrupted sensor measurements. The fewer assumptions that an estimator makes about the attacker, the larger the set of attacks it can protect the system against. In this paper, we focus on sensor attacks on UAVs and attempt to design a secure estimator for linear time-invariant systems based on as few assumptions about the attackers as possible. We propose a computationally efficient estimator that protects the system against arbitrary and unbounded attacks, where the set of attacked sensors can also change over time. In addition, we propose to combine our secure estimator with a Kalman Filter for improved practical performance and demonstrate its effectiveness through simulations of two scenarios where an UAV is under adversarial cyber attack.

SYMar 18, 2016
Model Comparison of a Data-Driven and a Physical Model for Simulating HVAC Systems

Datong Zhou, Qie Hu, Claire J. Tomlin

Commercial buildings are responsible for a large fraction of energy consumption in developed countries, and therefore are targets of energy efficiency programs. Motivated by the large inherent thermal inertia of buildings, the power consumption can be flexibly scheduled without compromising occupant comfort. This temporal flexibility offers opportunities for the provision of frequency regulation to support grid stability. To realize energy savings and frequency regulation, it is of prime importance to identify a realistic model for the temperature dynamics of a building. We identify a low- dimensional data-driven model and a high-dimensional physics- based model for different spatial granularities and temporal seasons based on a case study of an entire floor of Sutardja Dai Hall, an office building on the University of California, Berkeley campus. A comparison of these contrasting models shows that, despite the higher forecasting accuracy of the physics-based model, both models perform almost equally well for energy efficient control. We conclude that the data-driven model is more amenable to controller design due to its low complexity, and could serve as a substitution for highly complex physics- based models with an insignificant loss of prediction accuracy for many applications. On the other hand, our physics-based approach is more suitable for modeling buildings with finer spatial granularities.

SYFeb 19, 2017
Stability Analysis of Wholesale Electricity Markets under Dynamic Consumption Models and Real-Time Pricing

Datong P. Zhou, Mardavij Roozbehani, Munther A. Dahleh et al.

This paper analyzes stability conditions for wholesale electricity markets under real-time retail pricing and realistic consumption models with memory, which explicitly take into account previous electricity prices and consumption levels. By passing on the current retail price of electricity from supplier to consumer and feeding the observed consumption back to the supplier, a closed-loop dynamical system for electricity prices and consumption arises whose stability is to be investigated. Under mild assumptions on the generation cost of electricity and consumers' backlog disutility functions, we show that, for consumer models with price memory only, market stability is achieved if the ratio between the consumers' marginal backlog disutility and the suppliers' marginal cost of supply remains below a fixed threshold. Further, consumer models with price and consumption memory can result in greater stability regions and faster convergence to the equilibrium compared to models with price memory alone, if consumption deviations from nominal demand are adequately penalized.

SYMar 16
A Forward Reachability Perspective on Control Barrier Functions and Discount Factors in Reachability Analysis

Jason J. Choi, Donggun Lee, Boyang Li et al.

Control invariant sets are crucial for various methods that aim to design safe control policies for systems whose state constraints must be satisfied over an indefinite time horizon. In this article, we explore the connections among reachability, control invariance, and Control Barrier Functions (CBFs). Unlike prior formulations based on backward reachability concepts, we establish a strong link between these three concepts by examining the inevitable Forward Reachable Tube (FRT), which is the set of states such that every trajectory reaching the FRT must have passed through a given initial set of states. First, our findings show that the inevitable FRT is a robust control invariant set if it has a continuously differentiable boundary. If the boundary is not differentiable, the FRT may lose invariance. We also show that any robust control invariant set including the initial set is a superset of the FRT if the boundary of the invariant set is differentiable. Next, we formulate a differential game between the control and disturbance, where the inevitable FRT is characterized by the zero-superlevel set of the value function. By incorporating a discount factor in the cost function of the game, the barrier constraint of the CBF naturally arises in the Hamilton-Jacobi (HJ) equation and determines the optimal policy. The resulting FRT value function serves as a CBF-like function, and conversely, any valid CBF is also a forward reachability value function. We further prove that any $C^1$ supersolution of the HJ equation for the FRT value functions is a valid CBF and characterizes a robust control invariant set that outer-approximates the FRT. Building on this property, finally, we devise a novel method that learns neural control barrier functions, which learn an control invariant superset of the FRT of a given initial set.

ROSep 13, 2023
Out of Distribution Detection via Domain-Informed Gaussian Process State Space Models

Alonso Marco, Elias Morley, Claire J. Tomlin

In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly distinguish between in- and out-of-training distribution observations hinges on the accuracy of these predictions, primarily affected by the class of functions the GPSSM kernel can represent. In this paper, we propose (i) a novel approach to embed existing domain knowledge in the kernel and (ii) an OoD online runtime monitor, based on receding-horizon predictions. Domain knowledge is provided in the form of a dataset, collected either in simulation or by using a nominal model. Numerical results show that the informed kernel yields better regression quality with smaller datasets, as compared to standard kernel choices. We demonstrate the effectiveness of the OoD monitor on a real quadruped navigating an indoor setting, which reliably classifies previously unseen terrains.

SYMar 17, 2017
Hedging Strategies for Load-Serving Entities in Wholesale Electricity Markets

Datong P. Zhou, Munther A. Dahleh, Claire J. Tomlin

Load-serving entities which procure electricity from the wholesale electricity market to service end-users face significant quantity and price risks due to the volatile nature of electricity demand and quasi-fixed residential tariffs at which electricity is sold. This paper investigates strategies for load serving entities to hedge against such price risks. Specifically, we compute profit-maximizing portfolios of forward contract and call options as a function of the uncertain aggregate user demand. We compare the profit to the case of Demand Response, where users are offered monetary incentives to temporarily reduce their consumption during periods of supply shortages. Using smart meter data of residential customers in California, we simulate optimal portfolios and derive conditions under which Demand Response outperforms call options and forward contracts.

SYNov 6, 2017
Safe and Resilient Multi-vehicle Trajectory Planning Under Adversarial Intruder

Somil Bansal, Mo Chen, Claire J. Tomlin

Provably safe and scalable multi-vehicle trajectory planning is an important and urgent problem. Hamilton-Jacobi (HJ) reachability is an ideal tool for analyzing such safety-critical systems and has been successfully applied to several small-scale problems. However, a direct application of HJ reachability to multi-vehicle trajectory planning is often intractable due to the "curse of dimensionality." To overcome this problem, the sequential trajectory planning (STP) method, which assigns strict priorities to vehicles, was proposed, STP allows multi-vehicle trajectory planning to be done with a linearly-scaling computation complexity. However, if a vehicle not in the set of STP vehicles enters the system, or even worse, if this vehicle is an adversarial intruder, the previous formulation requires the entire system to perform replanning, an intractable task for large-scale systems. In this paper, we make STP more practical by providing a new algorithm where replanning is only needed only for a fixed number of vehicles, irrespective of the total number of STP vehicles. Moreover, this number is a design parameter, which can be chosen based on the computational resources available during run time. We demonstrate this algorithm in a representative simulation of an urban airspace environment.

SOC-PHOct 25, 2018
Estimating Heterogeneous Treatment Effects in Residential Demand Response

Datong P. Zhou, Maximilian Balandat, Claire J. Tomlin

We evaluate the causal effect of hour-ahead price interventions on the reduction in residential electricity consumption using a data set from a large-scale experiment on 7,000 households in California. By estimating user-level counterfactuals using time-series prediction, we estimate an average treatment effect of ~0.10 kWh (11%) per intervention and household. Next, we leverage causal decision trees to detect treatment effect heterogeneity across users by incorporating census data. These decision trees depart from classification and regression trees, as we intend to estimate a causal effect between treated and control units rather than perform outcome regression. We compare the performance of causal decision trees with a simpler, yet more inaccurate k-means clustering approach that naively detects heterogeneity in the feature space, confirming the superiority of causal decision trees. Lastly, we comment on how our methods to detect heterogeneity can be used for targeting households to improve cost efficiency.

ROSep 10, 2024
Gait Switching and Enhanced Stabilization of Walking Robots with Deep Learning-based Reachability: A Case Study on Two-link Walker

Xingpeng Xia, Jason J. Choi, Ayush Agrawal et al.

Learning-based approaches have recently shown notable success in legged locomotion. However, these approaches often lack accountability, necessitating empirical tests to determine their effectiveness. In this work, we are interested in designing a learning-based locomotion controller whose stability can be examined and guaranteed. This can be achieved by verifying regions of attraction (RoAs) of legged robots to their stable walking gaits. This is a non-trivial problem for legged robots due to their hybrid dynamics. Although previous work has shown the utility of Hamilton-Jacobi (HJ) reachability to solve this problem, its practicality was limited by its poor scalability. The core contribution of our work is the employment of a deep learning-based HJ reachability solution to the hybrid legged robot dynamics, which overcomes the previous work's limitation. With the learned reachability solution, first, we can estimate a library of RoAs for various gaits. Second, we can design a one-step predictive controller that effectively stabilizes to an individual gait within the verified RoA. Finally, we can devise a strategy that switches gaits, in response to external perturbations, whose feasibility is guided by the RoA analysis. We demonstrate our method in a two-link walker simulation, whose mathematical model is well established. Our method achieves improved stability than previous model-based methods, while ensuring transparency that was not present in the existing learning-based approaches.

SYApr 3
Inverse Safety Filtering: Inferring Constraints from Safety Filters for Decentralized Coordination

Minh Nguyen, Jingqi Li, Gechen Qu et al.

Safe multi-agent coordination in uncertain environments can benefit from learning constraints from other agents. Implicitly communicating safety constraints through actions is a promising approach, allowing agents to coordinate and maintain safety without expensive communication channels. This paper introduces an online method to infer constraints from observing the safety-filtered actions of other agents. We approach the problem by using safety filters to ensure forward safety and exploit their structure to work backwards and infer constraints. We provide sufficient conditions under which we can infer these constraints and prove that our inference method converges. This constraint inference procedure is coupled with a decentralized planning method that ensures safety when the constraint activation distance is sufficiently large. We then empirically validate our method with Monte Carlo simulations and hardware experiments with quadruped robots.

SYMar 26
From Global to Local: Hierarchical Probabilistic Verification for Reachability Learning

Ebonye Smith, Sampada Deglurkar, Jingqi Li et al.

Hamilton-Jacobi (HJ) reachability provides formal safety guarantees for nonlinear systems. However, it becomes computationally intractable in high-dimensional settings, motivating learning-based approximations that may introduce unsafe errors or overly optimistic safe sets. In this work, we propose a hierarchical probabilistic verification framework for reachability learning that bridges offline global certification and online local refinement. We first construct a coarse safe set using scenario optimization, providing an efficient global probabilistic certificate. We then introduce an online local refinement module that expands the certified safe set near its boundary by solving a sequence of convex programs, recovering regions excluded by the global verification. This refinement reduces conservatism while focusing computation on critical regions of the state space. We provide probabilistic safety guarantees for both the global and locally refined sets. Integrated with a switching mechanism between a learned reachability policy and a model-based controller, the proposed framework improves success rates in goal-reaching tasks with safety constraints, as demonstrated in simulation experiments of two drones racing to a goal with complex safety constraints.

SYMar 26
Active Calibration of Reachable Sets Using Approximate Pick-to-Learn

Sampada Deglurkar, Ebonye Smith, Jingqi Li et al.

Reachability computations that rely on learned or estimated models require calibration in order to uphold confidence about their guarantees. Calibration generally involves sampling scenarios inside the reachable set. However, producing reasonable probabilistic guarantees may require many samples, which can be costly. To remedy this, we propose that calibration of reachable sets be performed using active learning strategies. In order to produce a probabilistic guarantee on the active learning, we adapt the Pick-to-Learn algorithm, which produces generalization bounds for standard supervised learning, to the active learning setting. Our method, Approximate Pick-to-Learn, treats the process of choosing data samples as maximizing an approximate error function. We can then use conformal prediction to ensure that the approximate error is close to the true model error. We demonstrate our technique for a simulated drone racing example in which learning is used to provide an initial guess of the reachable tube. Our method requires fewer samples to calibrate the model and provides more accurate sets than the baselines. We simultaneously provide tight generalization bounds.

LGDec 23, 2021Code
Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J. Tomlin et al.

Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation. See https://github.com/SafeRoboticsLab/safety_rl for code and supplementary material.

SYMar 20
A Spectral Perspective on Stochastic Control Barrier Functions

Inkyu Jang, Chams E. Mballo, Claire J. Tomlin et al.

Stochastic control barrier functions (SCBFs) provide a safety-critical control framework for systems subject to stochastic disturbances by bounding the probability of remaining within a safe set. However, synthesizing a valid SCBF that explicitly reflects the true safety probability of the system, which is the most natural measure of safety, remains a challenge. This paper addresses this issue by adopting a spectral perspective, utilizing the linear operator that governs the evolution of the closed-loop system's safety probability. We find that the dominant eigenpair of this Koopman-like operator encodes fundamental safety information of the stochastic system. The dominant eigenfunction is a natural and valid SCBF, with values that explicitly quantify the relative long-term safety of the state, while the dominant eigenvalue indicates the global rate at which the safety probability decays. A practical synthesis algorithm is proposed, termed power-policy iteration, which jointly computes the dominant eigenpair and an optimized backup policy. The method is validated using simulation experiments on safety-critical dynamics models.

LGFeb 7, 2024
Safety Filters for Black-Box Dynamical Systems by Learning Discriminating Hyperplanes

Will Lavanakul, Jason J. Choi, Koushil Sreenath et al.

Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at each state is what matters. By focusing on this constraint, we can eliminate dependence on any specific certificate function-based design. To achieve this, we define a discriminating hyperplane that shapes the half-space constraint on control input at each state, serving as a sufficient condition for safety. This concept not only generalizes over traditional safety methods but also simplifies safety filter design by eliminating dependence on specific certificate functions. We present two strategies to learn the discriminating hyperplane: (a) a supervised learning approach, using pre-verified control invariant sets for labeling, and (b) a reinforcement learning (RL) approach, which does not require such labels. The main advantage of our method, unlike conventional safe RL approaches, is the separation of performance and safety. This offers a reusable safety filter for learning new tasks, avoiding the need to retrain from scratch. As such, we believe that the new notion of the discriminating hyperplane offers a more generalizable direction towards designing safety filters, encompassing and extending existing certificate-function-based or safe RL methodologies.

ROJan 21, 2022
Computation of Regions of Attraction for Hybrid Limit Cycles Using Reachability: An Application to Walking Robots

Jason J. Choi, Ayush Agrawal, Koushil Sreenath et al.

Contact-rich robotic systems, such as legged robots and manipulators, are often represented as hybrid systems. However, the stability analysis and region-of-attraction computation for these systems are often challenging because of the discontinuous state changes upon contact (also referred to as state resets). In this work, we cast the computation of region-ofattraction as a Hamilton-Jacobi (HJ) reachability problem. This enables us to leverage HJ reachability tools that are compatible with general nonlinear system dynamics, and can formally deal with state and input constraints as well as bounded disturbances. Our main contribution is the generalization of HJ reachability framework to account for the discontinuous state changes originating from state resets, which has remained a challenge until now. We apply our approach for computing region-of-attractions for several underactuated walking robots and demonstrate that the proposed approach can (a) recover a bigger region-of-attraction than state-of-the-art approaches, (b) handle state resets, nonlinear dynamics, external disturbances, and input constraints, and (c) also provides a stabilizing controller for the system that can leverage the state resets for enhancing system stability.

AIDec 17, 2021
Compositional Learning-based Planning for Vision POMDPs

Sampada Deglurkar, Michael H. Lim, Johnathan Tucker et al.

The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In this work, we propose Visual Tree Search (VTS), a compositional learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. The deep generative observation models evaluate the likelihood of and predict future image observations in a Monte Carlo tree search planner. We show that VTS is robust to different types of image noises that were not present during training and can adapt to different reward structures without the need to re-train. This new approach significantly and stably outperforms several baseline state-of-the-art vision POMDP algorithms while using a fraction of the training time.

SYSep 21, 2021
Towards cyber-physical systems robust to communication delays: A differential game approach

Shankar A. Deka, Donggun Lee, Claire J. Tomlin

Collaboration between interconnected cyber-physical systems is becoming increasingly pervasive. Time-delays in communication channels between such systems are known to induce catastrophic failure modes, like high frequency oscillations in robotic manipulators in bilateral teleoperation or string instability in platoons of autonomous vehicles. This paper considers nonlinear time-delay systems representing coupled robotic agents, and proposes controllers that are robust to time-varying communication delays. We introduce approximations that allow the delays to be considered as implicit control inputs themselves, and formulate the problem as a zero-sum differential game between the stabilizing controllers and the delays acting adversarially. The ensuing optimal control law is finally compared to known results from Lyapunov-Krasovskii based approaches via numerical experiments.

ROSep 10, 2021
Discretizing Dynamics for Maximum Likelihood Constraint Inference

Kaylene C. Stocking, David L. McPherson, Robert P. Matthew et al.

Maximum likelihood constraint inference is a powerful technique for identifying unmodeled constraints that affect the behavior of a demonstrator acting under a known objective function. However, it was originally formulated only for discrete state-action spaces. Continuous dynamics are more useful for modeling many real-world systems of interest, including the movements of humans and robots. We present a method to generate a tabular state-action space that approximates continuous dynamics and can be used for constraint inference on demonstrations that obey the true system dynamics. We then demonstrate accurate constraint inference on nonlinear pendulum systems with 2- and 4-dimensional state spaces, and show that performance is robust to a range of hyperparameters. The demonstrations are not required to be fully optimal with respect to the objective, and the most likely constraints can be identified even when demonstrations cover only a small portion of the state space. For these reasons, the proposed approach may be especially useful for inferring constraints on human demonstrators, which has important applications in human-robot interaction and biomechanical medicine.

SYJun 13, 2021
Pointwise Feasibility of Gaussian Process-based Safety-Critical Control under Model Uncertainty

Fernando Castañeda, Jason J. Choi, Bike Zhang et al.

Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. They are commonly utilized to build constraints that can be incorporated in a min-norm quadratic program (CBF-CLF-QP) which solves for a safety-critical control input. However, since these constraints rely on a model of the system, when this model is inaccurate the guarantees of safety and stability can be easily lost. In this paper, we present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs. The considered model uncertainty is affected by both state and control input. We derive probabilistic bounds on the effects that such model uncertainty has on the dynamics of the CBF and CLF. We then use these bounds to build safety and stability chance constraints that can be incorporated in a min-norm convex optimization-based controller, called GP-CBF-CLF-SOCP. As the main theoretical result of the paper, we present necessary and sufficient conditions for pointwise feasibility of the proposed optimization problem. We believe that these conditions could serve as a starting point towards understanding what are the minimal requirements on the distribution of data collected from the real system in order to guarantee safety. Finally, we validate the proposed framework with numerical simulations of an adaptive cruise controller for an automotive system.

ROJun 7, 2021
Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations

Lasse Peters, David Fridovich-Keil, Vicenç Rubies-Royo et al.

Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desired behavior. In this paper, we propose a method for inferring parametric objective models of multiple agents based on observed interactions. Our inverse game solver jointly optimizes player objectives and continuous-state estimates by coupling them through Nash equilibrium constraints. Hence, our method is able to directly maximize the observation likelihood rather than other non-probabilistic surrogate criteria. Our method does not require full observations of game states or player strategies to identify player objectives. Instead, it robustly recovers this information from noisy, partial state observations. As a byproduct of estimating player objectives, our method computes a Nash equilibrium trajectory corresponding to those objectives. Thus, it is suitable for downstream trajectory forecasting tasks. We demonstrate our method in several simulated traffic scenarios. Results show that it reliably estimates player objectives from a short sequence of noise-corrupted partial state observations. Furthermore, using the estimated objectives, our method makes accurate predictions of each player's trajectory.

ROMar 9, 2021
Analyzing Human Models that Adapt Online

Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin et al.

Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human's goal? Or, what parameter initializations guarantee that the robot can learn the human's preferences in a finite number of observations? To answer such analysis questions, our key idea is to model the robot's learning algorithm as a dynamical system where the state is the current model parameter estimate and the control is the human data the robot observes. This enables us to leverage tools from reachability analysis and optimal control to compute the set of hypotheses the robot could learn in finite time, as well as the worst and best-case time it takes to learn them. We demonstrate the utility of our analysis tool in four human-robot domains, including autonomous driving and indoor navigation.

ROFeb 14, 2021
FaSTrack: a Modular Framework for Real-Time Motion Planning and Guaranteed Safe Tracking

Mo Chen, Sylvia L. Herbert, Haimin Hu et al.

Real-time, guaranteed safe trajectory planning is vital for navigation in unknown environments. However, real-time navigation algorithms typically sacrifice robustness for computation speed. Alternatively, provably safe trajectory planning tends to be too computationally intensive for real-time replanning. We propose FaSTrack, Fast and Safe Tracking, a framework that achieves both real-time replanning and guaranteed safety. In this framework, real-time computation is achieved by allowing any trajectory planner to use a simplified \textit{planning model} of the system. The plan is tracked by the system, represented by a more realistic, higher-dimensional \textit{tracking model}. We precompute the tracking error bound (TEB) due to mismatch between the two models and due to external disturbances. We also obtain the corresponding tracking controller used to stay within the TEB. The precomputation does not require prior knowledge of the environment. We demonstrate FaSTrack using Hamilton-Jacobi reachability for precomputation and three different real-time trajectory planners with three different tracking-planning model pairs.

ROJan 15, 2021
Scalable Learning of Safety Guarantees for Autonomous Systems using Hamilton-Jacobi Reachability

Sylvia Herbert, Jason J. Choi, Suvansh Sanjeev et al.

Autonomous systems like aircraft and assistive robots often operate in scenarios where guaranteeing safety is critical. Methods like Hamilton-Jacobi reachability can provide guaranteed safe sets and controllers for such systems. However, often these same scenarios have unknown or uncertain environments, system dynamics, or predictions of other agents. As the system is operating, it may learn new knowledge about these uncertainties and should therefore update its safety analysis accordingly. However, work to learn and update safety analysis is limited to small systems of about two dimensions due to the computational complexity of the analysis. In this paper we synthesize several techniques to speed up computation: decomposition, warm-starting, and adaptive grids. Using this new framework we can update safe sets by one or more orders of magnitude faster than prior work, making this technique practical for many realistic systems. We demonstrate our results on simulated 2D and 10D near-hover quadcopters operating in a windy environment.

LGDec 18, 2020
Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.

SYNov 14, 2020
Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics

Fernando Castañeda, Jason J. Choi, Bike Zhang et al.

This paper presents a method to design a min-norm Control Lyapunov Function (CLF)-based stabilizing controller for a control-affine system with uncertain dynamics using Gaussian Process (GP) regression. In order to estimate both state and input-dependent model uncertainty, we propose a novel compound kernel that captures the control-affine nature of the problem. Furthermore, by the use of GP Upper Confidence Bound analysis, we provide probabilistic bounds of the regression error, leading to the formulation of a CLF-based stability chance constraint which can be incorporated in a min-norm optimization problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP). The data-collection process and the training of the GP regression model are carried out in an episodic learning fashion. We validate the proposed algorithm and controller in numerical simulations of an inverted pendulum and a kinematic bicycle model, resulting in stable trajectories which are very similar to the ones obtained if we actually knew the true plant dynamics.

RONov 9, 2020
Encoding Defensive Driving as a Dynamic Nash Game

Chih-Yuan Chiu, David Fridovich-Keil, Claire J. Tomlin

Robots deployed in real-world environments should operate safely in a robust manner. In scenarios where an "ego" agent navigates in an environment with multiple other "non-ego" agents, two modes of safety are commonly proposed -- adversarial robustness and probabilistic constraint satisfaction. However, while the former is generally computationally intractable and leads to overconservative solutions, the latter typically relies on strong distributional assumptions and ignores strategic coupling between agents. To avoid these drawbacks, we present a novel formulation of robustness within the framework of general-sum dynamic game theory, modeled on defensive driving. More precisely, we prepend an adversarial phase to the ego agent's cost function. That is, we prepend a time interval during which other agents are assumed to be temporarily distracted, in order to render the ego agent's equilibrium trajectory robust against other agents' potentially dangerous behavior during this time. We demonstrate the effectiveness of our new formulation in encoding safety via multiple traffic scenarios.

SYNov 1, 2020
Approximate Solutions to a Class of Reachability Games

David Fridovich-Keil, Claire J. Tomlin

In this paper, we present a method for finding approximate Nash equilibria in a broad class of reachability games. These games are often used to formulate both collision avoidance and goal satisfaction. Our method is computationally efficient, running in real-time for scenarios involving multiple players and more than ten state dimensions. The proposed approach forms a family of increasingly exact approximations to the original game. Our results characterize the quality of these approximations and show operation in a receding horizon, minimally-invasive control context. Additionally, as a special case, our method reduces to local gradient-based optimization in the single-player (optimal control) setting, for which a wide variety of efficient algorithms exist.

LGSep 7, 2020
Dynamically Computing Adversarial Perturbations for Recurrent Neural Networks

Shankar A. Deka, Dušan M. Stipanović, Claire J. Tomlin

Convolutional and recurrent neural networks have been widely employed to achieve state-of-the-art performance on classification tasks. However, it has also been noted that these networks can be manipulated adversarially with relative ease, by carefully crafted additive perturbations to the input. Though several experimentally established prior works exist on crafting and defending against attacks, it is also desirable to have theoretical guarantees on the existence of adversarial examples and robustness margins of the network to such examples. We provide both in this paper. We focus specifically on recurrent architectures and draw inspiration from dynamical systems theory to naturally cast this as a control problem, allowing us to dynamically compute adversarial perturbations at each timestep of the input sequence, thus resembling a feedback controller. Illustrative examples are provided to supplement the theoretical discussions.

SYApr 16, 2020
Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions

Jason Choi, Fernando Castañeda, Claire J. Tomlin et al.

In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model-based CBF-CLF-QP, resulting in the Reinforcement Learning-based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.

SYApr 15, 2020
Improving Input-Output Linearizing Controllers for Bipedal Robots via Reinforcement Learning

Fernando Castañeda, Mathias Wulfman, Ayush Agrawal et al.

The main drawbacks of input-output linearizing controllers are the need for precise dynamics models and not being able to account for input constraints. Model uncertainty is common in almost every robotic application and input saturation is present in every real world system. In this paper, we address both challenges for the specific case of bipedal robot control by the use of reinforcement learning techniques. Taking the structure of a standard input-output linearizing controller, we use an additive learned term that compensates for model uncertainty. Moreover, by adding constraints to the learning problem we manage to boost the performance of the final controller when input limits are present. We demonstrate the effectiveness of the designed framework for different levels of uncertainty on the five-link planar walking robot RABBIT.

LGApr 6, 2020
Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning

Tyler Westenbroek, Eric Mazumdar, David Fridovich-Keil et al.

This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enables the use of general function approximators to approximate the linearizing controller for the system without having to worry about singularities. However, the discrete-time and stochastic nature of these algorithms precludes the direct application of standard machinery from the adaptive control literature to provide deterministic stability proofs for the system. Nevertheless, we leverage these techniques alongside tools from the stochastic approximation literature to demonstrate that with high probability the tracking and parameter errors concentrate near zero when a certain persistence of excitation condition is satisfied. A simulated example of a double pendulum demonstrates the utility of the proposed theory. 1

ROFeb 11, 2020
Inference-Based Strategy Alignment for General-Sum Differential Games

Lasse Peters, David Fridovich-Keil, Claire J. Tomlin et al.

In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often, problems admit multiple equilibria. From the perspective of a single agent in such a game, this multiplicity of solutions can introduce uncertainty about how other agents will behave. This paper proposes a general framework for resolving ambiguity between equilibria by reasoning about the equilibrium other agents are aiming for. We demonstrate this framework in simulations of a multi-player human-robot navigation problem that yields two main conclusions: First, by inferring which equilibrium humans are operating at, the robot is able to predict trajectories more accurately, and second, by discovering and aligning itself to this equilibrium the robot is able to reduce the cost for all players.