OCAug 4, 2012
Provably Safe and Robust Learning-Based Model Predictive ControlAnil Aswani, Humberto Gonzalez, S. Shankar Sastry et al.
Controller design faces a trade-off between robustness and performance, and the reliability of linear controllers has caused many practitioners to focus on the former. However, there is renewed interest in improving system performance to deal with growing energy constraints. This paper describes a learning-based model predictive control (LBMPC) scheme that provides deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system in order to improve performance; the benefits of this framework are that it handles state and input constraints, optimizes system performance with respect to a cost function, and can be designed to use a wide variety of parametric or nonparametric statistical tools. The main insight of LBMPC is that safety and performance can be decoupled under reasonable conditions in an optimization framework by maintaining two models of the system. The first is an approximate model with bounds on its uncertainty, and the second model is updated by statistical methods. LBMPC improves performance by choosing inputs that minimize a cost subject to the learned dynamics, and it ensures safety and robustness by checking whether these same inputs keep the approximate model stable when it is subject to uncertainty. Furthermore, we show that if the system is sufficiently excited, then the LBMPC control action probabilistically converges to that of an MPC computed using the true dynamics.
CVJul 14, 2023Code
Linking vision and motion for self-supervised object-centric perceptionKaylene C. Stocking, Zak Murez, Vijay Badrinarayanan et al.
Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features. Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization. In this work we adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs. We demonstrate that our method obtains promising results on the Waymo Open perception dataset. While object mask quality lags behind supervised methods or alternatives that use more privileged information, we find that our model is capable of learning a representation that fuses multiple camera viewpoints over time and successfully tracks many vehicles and pedestrians in the dataset. Code for our model is available at https://github.com/wayveai/SOCS.
SYNov 15, 2016
Event Detection and Localization in Distribution Grids with Phasor Measurement UnitsOmid Ardakanian, Ye Yuan, Roel Dobbe et al.
The recent introduction of synchrophasor technology into power distribution systems has given impetus to various monitoring, diagnostic, and control applications, such as system identification and event detection, which are crucial for restoring service, preventing outages, and managing equipment health. Drawing on the existing framework for inferring topology and admittances of a power network from voltage and current phasor measurements, this paper proposes an online algorithm for event detection and localization in unbalanced three-phase distribution systems. Using a convex relaxation and a matrix partitioning technique, the proposed algorithm is capable of identifying topology changes and attributing them to specific categories of events. The performance of this algorithm is evaluated on a standard test distribution feeder with synthesized loads, and it is shown that a tripped line can be detected and localized in an accurate and timely fashion, highlighting its potential for real-world applications.
MAJan 31, 2017
Reachability-Based Safety and Goal Satisfaction of Unmanned Aerial Platoons on Air HighwaysMo Chen, Qie Hu, Jaime Fisac et al.
Recently, there has been immense interest in using unmanned aerial vehicles (UAVs) for civilian operations. As a result, unmanned aerial systems traffic management is needed to ensure the safety and goal satisfaction of potentially thousands of UAVs flying simultaneously. Currently, the analysis of large multi-agent systems cannot tractably provide these guarantees if the agents' set of maneuvers is unrestricted. In this paper, platoons of UAVs flying on air highways is proposed to impose an airspace structure that allows for tractable analysis. For the air highway placement problem, the fast marching method is used to produce a sequence of air highways that minimizes the cost of flying from an origin to any destination. The placement of air highways can be updated in real-time to accommodate sudden airspace changes. Within platoons traveling on air highways, each vehicle is modeled as a hybrid system. Using Hamilton-Jacobi reachability, safety and goal satisfaction are guaranteed for all mode transitions. For a single altitude range, the proposed approach guarantees safety for one safety breach per vehicle, in the unlikely event of multiple safety breaches, safety can be guaranteed over multiple altitude ranges. We demonstrate the platooning concept through simulations of three representative scenarios.
LGJun 21, 2022
Lyapunov Density Models: Constraining Distribution Shift in Learning-Based ControlKatie Kang, Paula Gradu, Jason Choi et al.
Learned models and policies can generalize effectively when evaluated within the distribution of the training data, but can produce unpredictable and erroneous outputs on out-of-distribution inputs. In order to avoid distribution shift when deploying learning-based control algorithms, we seek a mechanism to constrain the agent to states and actions that resemble those that it was trained on. In control theory, Lyapunov stability and control-invariant sets allow us to make guarantees about controllers that stabilize the system around specific states, while in machine learning, density models allow us to estimate the training data distribution. Can we combine these two concepts, producing learning-based control algorithms that constrain the system to in-distribution states using only in-distribution actions? In this work, we propose to do this by combining concepts from Lyapunov stability and density estimation, introducing Lyapunov density models: a generalization of control Lyapunov functions and density models that provides guarantees on an agent's ability to stay in-distribution over its entire trajectory.
SYJul 3, 2016
Residential Demand Response Targeting Using Machine Learning with Observational DataDatong Zhou, Maximilian Balandat, Claire Tomlin
The large scale deployment of Advanced Metering Infrastructure among residential energy customers has served as a boon for energy systems research relying on granular consumption data. Residential Demand Response aims to utilize the flexibility of consumers to reduce their energy usage during times when the grid is strained. Suitable incentive mechanisms to encourage customers to deviate from their usual behavior have to be implemented to correctly control the bids into the wholesale electricity market as a Demand Response provider. In this paper, we present a framework for short term load forecasting on an individual user level, and relate nonexperimental estimates of Demand Response efficacy, i.e. the estimated reduction of consumption during Demand Response events, to the variability of user consumption. We apply our framework on a data set from a residential Demand Response program in the Western United States. Our results suggest that users with more variable consumption patterns are more likely to reduce their consumption compared to users with a more regular consumption behavior.
LGDec 1, 2022
Multi-Task Imitation Learning for Linear Dynamical SystemsThomas T. Zhang, Katie Kang, Bruce D. Lee et al.
We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.
SYNov 27, 2019
Linear Single- and Three-Phase Voltage Forecasting and Bayesian State Estimation with Limited SensingRoel Dobbe, Werner van Westering, Stephan Liu et al.
Implementing state estimation in low and medium voltage power distribution is still challenging given the scale of many networks and the reliance of traditional methods on a large number of measurements. This paper proposes a method to improve voltage predictions in real-time by leveraging a limited set of real-time measurements. The method relies on Bayesian estimation formulated as a linear least squares estimation problem, which resembles the classical weighted least-squares (WLS) approach for scenarios where full network observability is not available. We build on recently developed linear approximations for unbalanced three-phase power flow to construct voltage predictions as a linear mapping of load predictions constructed with Gaussian processes. The estimation step to update the voltage forecasts in real-time is a linear computation allowing fast high-resolution state estimate updates. The uncertainty in forecasts can be determined a priori and smoothed a posteriori, making the method useful for both planning, operation and post-hoc analysis. The method outperforms conventional WLS and is applied to different test feeders and validated on a real test feeder with the utility Alliander in The Netherlands.
OCMay 22, 2020
Customized Local Differential Privacy for Multi-Agent Distributed OptimizationRoel Dobbe, Ye Pu, Jingge Zhu et al.
Real-time data-driven optimization and control problems over networks may require sensitive information of participating users to calculate solutions and decision variables, such as in traffic or energy systems. Adversaries with access to coordination signals may potentially decode information on individual users and put user privacy at risk. We develop local differential privacy, which is a strong notion that guarantees user privacy regardless of any auxiliary information an adversary may have, for a larger family of convex distributed optimization problems. The mechanism allows agent to customize their own privacy level based on local needs and parameter sensitivities. We propose a general sampling based approach for determining sensitivity and derive analytical bounds for specific quadratic problems. We analyze inherent trade-offs between privacy and suboptimality and propose allocation schemes to divide the maximum allowable noise, a privacy budget, among all participating agents. Our algorithm is implemented to enable privacy in distributed optimal power flow for electric grids.
SYApr 25, 2012
Quantitative Methods for Comparing Different HVAC Control SchemesAnil Aswani, Neal Master, Jay Taneja et al.
Experimentally comparing the energy usage and comfort characteristics of different controllers in heating, ventilation, and air-conditioning (HVAC) systems is difficult because variations in weather and occupancy conditions preclude the possibility of establishing equivalent experimental conditions across the order of hours, days, and weeks. This paper is concerned with defining quantitative metrics of energy usage and occupant comfort, which can be computed and compared in a rigorous manner that is capable of determining whether differences between controllers are statistically significant in the presence of such environmental fluctuations. Experimental case studies are presented that compare two alternative controllers (a schedule controller and a hybrid system learning-based model predictive controller) to the default controller in a building-wide HVAC system. Lastly, we discuss how our proposed methodology may also be able to quantify the efficiency of other building automation systems.
OCJul 11, 2012
Incentive Design for Efficient Building Quality of ServiceAnil Aswani, Claire Tomlin
Buildings are a large consumer of energy, and reducing their energy usage may provide financial and societal benefits. One challenge in achieving efficient building operation is the fact that few financial motivations exist for encouraging low energy configuration and operation of buildings. As a result, incentive schemes for managers of large buildings are being proposed for the purpose of saving energy. This paper focuses on incentive design for the configuration and operation of building-wide heating, ventilation, and air-conditioning (HVAC) systems, because these systems constitute the largest portion of energy usage in most buildings. We begin with an empirical model of a building-wide HVAC system, which describes the tradeoffs between energy consumption, quality of service (as defined by occupant satisfaction), and the amount of work required for maintenance and configuration. The model has significant non-convexities, and so we derive some results regarding qualitative properties of non-convex optimization problems with certain partial-ordering features. These results are used to show that "baselining" incentive schemes suffer from moral hazard problems, and they also encourage energy reductions at the expense of also decreasing occupant satisfaction. We propose an alternative incentive scheme that has the interpretation of a performance-based bonus. A theoretical analysis shows that this encourages energy and monetary savings and modest gains in occupant satisfaction and quality of service, which is confirmed by our numerical simulations.
LGOct 2, 2023
Deep Neural Networks Tend To Extrapolate PredictablyKatie Kang, Amrith Setlur, Claire Tomlin et al. · cmu
Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural networks with high-dimensional inputs. Rather than extrapolating in arbitrary ways, we observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. Moreover, we find that this value often closely approximates the optimal constant solution (OCS), i.e., the prediction that minimizes the average loss over the training data without observing the input. We present results showing this phenomenon across 8 datasets with different distributional shifts (including CIFAR10-C and ImageNet-R, S), different loss functions (cross entropy, MSE, and Gaussian NLL), and different architectures (CNNs and transformers). Furthermore, we present an explanation for this behavior, which we first validate empirically and then study theoretically in a simplified setting involving deep homogeneous networks with ReLU activations. Finally, we show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
SYMay 21, 2018
Blind Identification of Fully Observed Linear Time-Varying Systems via Sparse RecoveryRoel Dobbe, Stephan Liu, Ye Yuan et al.
Discrete-time linear time-varying (LTV) systems form a powerful class of models to approximate complex dynamical systems with nonlinear dynamics for the purpose of analysis, design and control. Motivated by inference of spatio-temporal dynamics in breast cancer research, we propose a method to efficiently solve an identification problem for a specific class of discrete-time LTV systems, in which the states are fully observed and there is no access to system inputs. In addition, it is assumed that we do not know on which states the inputs act, which can change between time steps, and that the total number of inputs is sparse over all states and over time. The problem is formulated as a compressive sensing problem, which incorporates the effect of measurement noise and which has a solution with a partially sparse support. We derive sufficient conditions for the unique recovery of the system model and input values, which lead to practical conditions on the number of experiments and rank conditions on system outputs. Synthetic experiments analyze the method's sensitivity to noise for randomly generated models.
CVJul 15, 2024Code
Understanding the Dependence of Perception Model Competency on Regions in an ImageSara Pohland, Claire Tomlin
While deep neural network (DNN)-based perception models are useful for many applications, these models are black boxes and their outputs are not yet well understood. To confidently enable a real-world, decision-making system to utilize such a perception model without human intervention, we must enable the system to reason about the perception model's level of competency and respond appropriately when the model is incompetent. In order for the system to make an intelligent decision about the appropriate action when the model is incompetent, it would be useful for the system to understand why the model is incompetent. We explore five novel methods for identifying regions in the input image contributing to low model competency, which we refer to as image cropping, segment masking, pixel perturbation, competency gradients, and reconstruction loss. We assess the ability of these five methods to identify unfamiliar objects, recognize regions associated with unseen classes, and identify unexplored areas in an environment. We find that the competency gradients and reconstruction loss methods show great promise in identifying regions associated with low model competency, particularly when aspects of the image that are unfamiliar to the perception model are causing this reduction in competency. Both of these methods boast low computation times and high levels of accuracy in detecting image regions that are unfamiliar to the model, allowing them to provide potential utility in decision-making pipelines. The code for reproducing our methods and results is available on GitHub: https://github.com/sarapohland/explainable-competency.
ROJul 8, 2024
Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot NavigationSara Pohland, Alvin Tan, Prabal Dutta et al.
Reinforcement learning (RL) methods for social robot navigation show great success navigating robots through large crowds of people, but the performance of these learning-based methods tends to degrade in particularly challenging or unfamiliar situations due to the models' dependency on representative training data. To ensure human safety and comfort, it is critical that these algorithms handle uncommon cases appropriately, but the low frequency and wide diversity of such situations present a significant challenge for these data-driven methods. To overcome this challenge, we propose modifications to the learning process that encourage these RL policies to maintain additional caution in unfamiliar situations. Specifically, we improve the Socially Attentive Reinforcement Learning (SARL) policy by (1) modifying the training process to systematically introduce deviations into a pedestrian model, (2) updating the value network to estimate and utilize pedestrian-unpredictability features, and (3) implementing a reward function to learn an effective response to pedestrian unpredictability. Compared to the original SARL policy, our modified policy maintains similar navigation times and path lengths, while reducing the number of collisions by 82% and reducing the proportion of time spent in the pedestrians' personal space by up to 19 percentage points for the most difficult cases. We also describe how to apply these modifications to other RL policies and demonstrate that some key high-level behaviors of our approach transfer to a physical robot.
ROAug 30, 2025Code
Mechanistic interpretability for steering vision-language-action modelsBear Häon, Kaylene Stocking, Ian Chuang et al.
Vision-Language-Action (VLA) models are a promising path to realizing generalist embodied agents that can quickly adapt to new tasks, modalities, and environments. However, methods for interpreting and steering VLAs fall far short of classical robotics pipelines, which are grounded in explicit models of kinematics, dynamics, and control. This lack of mechanistic insight is a central challenge for deploying learned policies in real-world robotics, where robustness and explainability are critical. Motivated by advances in mechanistic interpretability for large language models, we introduce the first framework for interpreting and steering VLAs via their internal representations, enabling direct intervention in model behavior at inference time. We project feedforward activations within transformer layers onto the token embedding basis, identifying sparse semantic directions - such as speed and direction - that are causally linked to action selection. Leveraging these findings, we introduce a general-purpose activation steering method that modulates behavior in real time, without fine-tuning, reward signals, or environment interaction. We evaluate this method on two recent open-source VLAs, Pi0 and OpenVLA, and demonstrate zero-shot behavioral control in simulation (LIBERO) and on a physical robot (UR5). This work demonstrates that interpretable components of embodied VLAs can be systematically harnessed for control - establishing a new paradigm for transparent and steerable foundation models in robotics.
OCApr 10, 2014Code
Practical Comparison of Optimization Algorithms for Learning-Based MPC with Linear ModelsAnil Aswani, Patrick Bouffard, Xiaojing Zhang et al.
Learning-based control methods are an attractive approach for addressing performance and efficiency challenges in robotics and automation systems. One such technique that has found application in these domains is learning-based model predictive control (LBMPC). An important novelty of LBMPC lies in the fact that its robustness and stability properties are independent of the type of online learning used. This allows the use of advanced statistical or machine learning methods to provide the adaptation for the controller. This paper is concerned with providing practical comparisons of different optimization algorithms for implementing the LBMPC method, for the special case where the dynamic model of the system is linear and the online learning provides linear updates to the dynamic model. For comparison purposes, we have implemented a primal-dual infeasible start interior point method that exploits the sparsity structure of LBMPC. Our open source implementation (called LBmpcIPM) is available through a BSD license and is provided freely to enable the rapid implementation of LBMPC on other platforms. This solver is compared to the dense active set solvers LSSOL and qpOASES using a quadrotor helicopter platform. Two scenarios are considered: The first is a simulation comparing hovering control for the quadrotor, and the second is on-board control experiments of dynamic quadrotor flight. Though the LBmpcIPM method has better asymptotic computational complexity than LSSOL and qpOASES, we find that for certain integrated systems (like our quadrotor testbed) these methods can outperform LBmpcIPM. This suggests that actual benchmarks should be used when choosing which algorithm is used to implement LBMPC on practical systems.
LGMar 8, 2024
Unfamiliar Finetuning Examples Control How Language Models HallucinateKatie Kang, Eric Wallace, Claire Tomlin et al.
Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that an LLM's hallucinated predictions tend to mirror the responses associated with its unfamiliar finetuning examples. This suggests that by modifying how unfamiliar finetuning examples are supervised, we can influence a model's responses to unfamiliar queries (e.g., say ``I don't know''). We empirically validate this observation in a series of controlled experiments involving SFT, RL, and reward model finetuning on TriviaQA and MMLU. Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations. We find that, while hallucinations from the reward model can significantly undermine the effectiveness of RL factuality finetuning, strategically controlling how reward models hallucinate can minimize these negative effects. Leveraging our previous observations on controlling hallucinations, we propose an approach for learning more reliable reward models, and show that they improve the efficacy of RL factuality finetuning in long-form biography and book/movie plot generation tasks.
ROSep 9, 2024
Competency-Aware Planning for Probabilistically Safe Navigation Under Perception UncertaintySara Pohland, Claire Tomlin
Perception-based navigation systems are useful for unmanned ground vehicle (UGV) navigation in complex terrains, where traditional depth-based navigation schemes are insufficient. However, these data-driven methods are highly dependent on their training data and can fail in surprising and dramatic ways with little warning. To ensure the safety of the vehicle and the surrounding environment, it is imperative that the navigation system is able to recognize the predictive uncertainty of the perception model and respond safely and effectively in the face of uncertainty. In an effort to enable safe navigation under perception uncertainty, we develop a probabilistic and reconstruction-based competency estimation (PaRCE) method to estimate the model's level of familiarity with an input image as a whole and with specific regions in the image. We find that the overall competency score can correctly predict correctly classified, misclassified, and out-of-distribution (OOD) samples. We also confirm that the regional competency maps can accurately distinguish between familiar and unfamiliar regions across images. We then use this competency information to develop a planning and control scheme that enables effective navigation while maintaining a low probability of error. We find that the competency-aware scheme greatly reduces the number of collisions with unfamiliar obstacles, compared to a baseline controller with no competency awareness. Furthermore, the regional competency information is very valuable in enabling efficient navigation.
LGNov 12, 2024
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?Katie Kang, Amrith Setlur, Dibya Ghosh et al. · berkeley, cmu
Despite the remarkable capabilities of modern large language models (LLMs), the mechanisms behind their problem-solving abilities remain elusive. In this work, we aim to better understand how the learning dynamics of LLM finetuning shapes downstream generalization. Our analysis focuses on reasoning tasks, whose problem structure allows us to distinguish between memorization (the exact replication of reasoning steps from the training data) and performance (the correctness of the final solution). We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy: the accuracy of model samples on training queries before they begin to copy the exact reasoning steps from the training set. On the dataset level, this metric is able to reliably predict test accuracy, achieving $R^2$ of around or exceeding 0.9 across various models (Llama3 8, Gemma2 9B), datasets (GSM8k, MATH), and training configurations. On a per-example level, this metric is also indicative of whether individual model predictions are robust to perturbations in the training query. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies. We focus on data curation as an example, and show that prioritizing examples with low pre-memorization accuracy leads to 1.5-2x improvements in data efficiency compared to i.i.d. data scaling, and outperforms other standard data curation techniques.
CVNov 22, 2024
PaRCE: Probabilistic and Reconstruction-based Competency Estimation for CNN-based Image ClassificationSara Pohland, Claire Tomlin
Convolutional neural networks (CNNs) are extremely popular and effective for image classification tasks but tend to be overly confident in their predictions. Various works have sought to quantify uncertainty associated with these models, detect out-of-distribution (OOD) inputs, or identify anomalous regions in an image, but limited work has sought to develop a holistic approach that can accurately estimate perception model confidence across various sources of uncertainty. We develop a probabilistic and reconstruction-based competency estimation (PaRCE) method and compare it to existing approaches for uncertainty quantification and OOD detection. We find that our method can best distinguish between correctly classified, misclassified, and OOD samples with anomalous regions, as well as between samples with visual image modifications resulting in high, medium, and low prediction accuracy. We describe how to extend our approach for anomaly localization tasks and demonstrate the ability of our approach to distinguish between regions in an image that are familiar to the perception model from those that are unfamiliar. We find that our method generates interpretable scores that most reliably capture a holistic notion of perception model confidence.
CVApr 7, 2025
Explaining Low Perception Model Competency with High-Competency CounterfactualsSara Pohland, Claire Tomlin
There exist many methods to explain how an image classification model generates its decision, but very little work has explored methods to explain why a classifier might lack confidence in its prediction. As there are various reasons the classifier might lose confidence, it would be valuable for this model to not only indicate its level of uncertainty but also explain why it is uncertain. Counterfactual images have been used to visualize changes that could be made to an image to generate a different classification decision. In this work, we explore the use of counterfactuals to offer an explanation for low model competency--a generalized form of predictive uncertainty that measures confidence. Toward this end, we develop five novel methods to generate high-competency counterfactual images, namely Image Gradient Descent (IGD), Feature Gradient Descent (FGD), Autoencoder Reconstruction (Reco), Latent Gradient Descent (LGD), and Latent Nearest Neighbors (LNN). We evaluate these methods across two unique datasets containing images with six known causes for low model competency and find Reco, LGD, and LNN to be the most promising methods for counterfactual generation. We further evaluate how these three methods can be utilized by pre-trained Multimodal Large Language Models (MLLMs) to generate language explanations for low model competency. We find that the inclusion of a counterfactual image in the language model query greatly increases the ability of the model to generate an accurate explanation for the cause of low model competency, thus demonstrating the utility of counterfactual images in explaining low perception model competency.
CRJan 18, 2024
Hacking Predictors Means Hacking Cars: Using Sensitivity Analysis to Identify Trajectory Prediction Vulnerabilities for Autonomous Driving SecurityMarsalis Gibson, David Babazadeh, Claire Tomlin et al.
Adversarial attacks on learning-based multi-modal trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. The analysis reveals that between all inputs, almost all of the perturbation sensitivities for both models lie only within the most recent position and velocity states. We additionally demonstrate that, despite dominant sensitivity on state history perturbations, an undetectable image map perturbation made with the Fast Gradient Sign Method can induce large prediction error increases in both models, revealing that these trajectory predictors are, in fact, susceptible to image-based attacks. Using an optimization-based planner and example perturbations crafted from sensitivity results, we show how these attacks can cause a vehicle to come to a sudden stop from moderate driving speeds.
ROJan 18, 2022
Inducing Structure in Reward Learning by Learning FeaturesAndreea Bobu, Marius Wiggert, Claire Tomlin et al.
Reward learning enables robots to learn adaptable behaviors from human input. Traditional methods model the reward as a linear function of hand-crafted features, but that requires specifying all the relevant features a priori, which is impossible for real-world tasks. To get around this issue, recent deep Inverse Reinforcement Learning (IRL) methods learn rewards directly from the raw state but this is challenging because the robot has to implicitly learn the features that are important and how to combine them, simultaneously. Instead, we propose a divide and conquer approach: focus human input specifically on learning the features separately, and only then learn how to combine them into a reward. We introduce a novel type of human input for teaching features and an algorithm that utilizes it to learn complex features from the raw state space. The robot can then learn how to combine them into a reward using demonstrations, corrections, or other reward learning frameworks. We demonstrate our method in settings where all features have to be learned from scratch, as well as where some of the features are known. By first focusing human input specifically on the feature(s), our method decreases sample complexity and improves generalization of the learned reward over a deepIRL baseline. We show this in experiments with a physical 7DOF robot manipulator, as well as in a user study conducted in a simulated environment.
LGNov 27, 2021
Learning from learning machines: a new generation of AI technology to meet the needs of scienceLuca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin et al.
We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself.
SYSep 22, 2021
Incorporating Data Uncertainty in Object Tracking AlgorithmsAnish Muthali, Forrest Laine, Claire Tomlin
Methodologies for incorporating the uncertainties characteristic of data-driven object detectors into object tracking algorithms are explored. Object tracking methods rely on measurement error models, typically in the form of measurement noise, false positive rates, and missed detection rates. Each of these quantities, in general, can be dependent on object or measurement location. However, for detections generated from neural-network processed camera inputs, these measurement error statistics are not sufficient to represent the primary source of errors, namely a dissimilarity between run-time sensor input and the training data upon which the detector was trained. To this end, we investigate incorporating data uncertainty into object tracking methods such as to improve the ability to track objects, and particularly those which out-of-distribution w.r.t. training data. The proposed methodologies are validated on an object tracking benchmark as well on experiments with a real autonomous aircraft.
LGSep 15, 2021
Multi-Task Learning with Sequence-Conditioned Transporter NetworksMichael H. Lim, Andy Zeng, Brian Ichter et al.
Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks.
OCJan 8, 2021
The Computation of Approximate Generalized Feedback Nash EquilibriaForrest Laine, David Fridovich-Keil, Chih-Yuan Chiu et al.
We present the concept of a Generalized Feedback Nash Equilibrium (GFNE) in dynamic games, extending the Feedback Nash Equilibrium concept to games in which players are subject to state and input constraints. We formalize necessary and sufficient conditions for (local) GFNE solutions at the trajectory level, which enable the development of efficient numerical methods for their computation. Specifically, we propose a Newton-style method for finding game trajectories which satisfy necessary conditions for an equilibrium, which can then be checked against sufficiency conditions. We show that the evaluation of the necessary conditions in general requires computing a series of nested, implicitly-defined derivatives, which quickly becomes intractable. To this end, we introduce an approximation to the necessary conditions which is amenable to efficient evaluation, and in turn, computation of solutions. We term the solutions to the approximate necessary conditions Generalized Feedback Quasi-Nash Equilibria (GFQNE), and we introduce numerical methods for their computation. In particular, we develop a Sequential Linear-Quadratic Game approach, in which a LQ local approximation of the game is solved at each iteration. The development of this method relies on the ability to compute a GFNE to inequality- and equality-constrained LQ games, and therefore specific methods for the solution of these special cases are developed in detail. We demonstrate the effectiveness of the proposed solution approach on a dynamic game arising in an autonomous driving application.
RONov 11, 2020
Multi-Hypothesis Interactions in Game-Theoretic Motion PlanningForrest Laine, David Fridovich-Keil, Chih-Yuan Chiu et al.
We present a novel method for handling uncertainty about the intentions of non-ego players in dynamic games, with application to motion planning for autonomous vehicles. Equilibria in these games explicitly account for interaction among other agents in the environment, such as drivers and pedestrians. Our method models the uncertainty about the intention of other agents by constructing multiple hypotheses about the objectives and constraints of other agents in the scene. For each candidate hypothesis, we associate a Bernoulli random variable representing the probability of that hypothesis, which may or may not be independent of the probability of other hypotheses. We leverage constraint asymmetries and feedback information patterns to incorporate the probabilities of hypotheses in a natural way. Specifically, increasing the probability associated with a given hypothesis from $0$ to $1$ shifts the responsibility of collision avoidance from the hypothesized agent to the ego agent. This method allows the generation of interactive trajectories for the ego agent, where the level of assertiveness or caution that the ego exhibits is directly related to the easy-to-model uncertainty it maintains about the scene.
LGNov 11, 2020
Testing for Typicality with Respect to an Ensemble of Learned DistributionsForrest Laine, Claire Tomlin
Methods of performing anomaly detection on high-dimensional data sets are needed, since algorithms which are trained on data are only expected to perform well on data that is similar to the training data. There are theoretical results on the ability to detect if a population of data is likely to come from a known base distribution, which is known as the goodness-of-fit problem. One-sample approaches to this problem offer significant computational advantages for online testing, but require knowing a model of the base distribution. The ability to correctly reject anomalous data in this setting hinges on the accuracy of the model of the base distribution. For high dimensional data, learning an accurate-enough model of the base distribution such that anomaly detection works reliably is very challenging, as many researchers have noted in recent years. Existing methods for the one-sample goodness-of-fit problem do not account for the fact that a model of the base distribution is learned. To address that gap, we offer a theoretically motivated approach to account for the density learning procedure. In particular, we propose training an ensemble of density models, considering data to be anomalous if the data is anomalous with respect to any member of the ensemble. We provide a theoretical justification for this approach, proving first that a test on typicality is a valid approach to the goodness-of-fit problem, and then proving that for a correctly constructed ensemble of models, the intersection of typical sets of the models lies in the interior of the typical set of the base distribution. We present our method in the context of an example on synthetic data in which the effects we consider can easily be seen.
RONov 4, 2020
DeepReach: A Deep Learning Approach to High-Dimensional ReachabilitySomil Bansal, Claire Tomlin
Hamilton-Jacobi (HJ) reachability analysis is an important formal verification method for guaranteeing performance and safety properties of dynamical control systems. Its advantages include compatibility with general nonlinear system dynamics, formal treatment of bounded disturbances, and the ability to deal with state and input constraints. However, it involves solving a PDE, whose computational and memory complexity scales exponentially with respect to the number of state variables, limiting its direct use to small-scale systems. We propose DeepReach, a method that leverages new developments in sinusoidal networks to develop a neural PDE solver for high-dimensional reachability problems. The computational requirements of DeepReach do not scale directly with the state dimension, but rather with the complexity of the underlying reachable tube. DeepReach achieves comparable results to the state-of-the-art reachability methods, does not require any explicit supervision for the PDE solution, can easily handle external disturbances, adversarial inputs, and system constraints, and also provides a safety controller for the system. We demonstrate DeepReach on a 9D multi-vehicle collision problem, and a 10D narrow passage problem, motivated by autonomous driving applications.
LGOct 26, 2020
Expert Selection in High-Dimensional Markov Decision ProcessesVicenc Rubies-Royo, Eric Mazumdar, Roy Dong et al.
In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall performance of the system. This is useful in applications where several expert policies may be available, and one needs to be selected at run-time for the underlying environment.
ROJun 23, 2020
Feature Expansive Reward Learning: Rethinking Human InputAndreea Bobu, Marius Wiggert, Claire Tomlin et al.
When a person is not satisfied with how a robot performs a task, they can intervene to correct it. Reward learning methods enable the robot to adapt its reward function online based on such human input, but they rely on handcrafted features. When the correction cannot be explained by these features, recent work in deep Inverse Reinforcement Learning (IRL) suggests that the robot could ask for task demonstrations and recover a reward defined over the raw state space. Our insight is that rather than implicitly learning about the missing feature(s) from demonstrations, the robot should instead ask for data that explicitly teaches it about what it is missing. We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not. We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function. By focusing the human input on the missing feature, our method decreases sample complexity and improves generalization of the learned reward over the above deep IRL baseline. We show this in experiments with a physical 7DOF robot manipulator, as well as in a user study conducted in a simulated environment.
ROMar 20, 2020
Visual Navigation Among Humans with Optimal Control as a SupervisorVarun Tolani, Somil Bansal, Aleksandra Faust et al.
Real world visual navigation requires robots to operate in unfamiliar, human-occupied dynamic environments. Navigation around humans is especially difficult because it requires anticipating their future motion, which can be quite challenging. We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans based only on monocular, first-person RGB images. Our approach is enabled by our novel data-generation tool, HumANav that allows for photorealistic renderings of indoor environment scenes with humans in them, which are then used to train the perception module entirely in simulation. Through simulations and experiments on a mobile robot, we demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion, generalize to previously unseen environments and human behaviors, and transfer directly from simulation to reality. Videos describing our approach and experiments, as well as a demo of HumANav are available on the project website.
RODec 20, 2019
Generating Robust Supervision for Learning-Based Visual Navigation Using Hamilton-Jacobi ReachabilityAnjian Li, Somil Bansal, Georgios Giovanis et al.
In Bansal et al. (2019), a novel visual navigation framework that combines learning-based and model-based approaches has been proposed. Specifically, a Convolutional Neural Network (CNN) predicts a waypoint that is used by the dynamics model for planning and tracking a trajectory to the waypoint. However, the CNN inevitably makes prediction errors which often lead to collisions in cluttered and tight spaces. In this paper, we present a novel Hamilton-Jacobi (HJ) reachability-based method to generate supervision for the CNN for waypoint prediction in an unseen environment. By modeling CNN prediction error as "disturbances" in robot's dynamics, our generated waypoints are robust to these disturbances, and consequently to the prediction errors. Moreover, using globally optimal HJ reachability analysis leads to predicting waypoints that are time-efficient and avoid greedy behavior. Through simulations and hardware experiments, we demonstrate the advantages of the proposed approach on navigating through cluttered, narrow indoor environments.
RONov 16, 2019
Design of the First Insect-scale Spinning-wing RobotPalak Bhushan, Claire Tomlin
Here we present the design of an insect-scale microrobot that generates lift by spinning its wings. This is in contrast to most other microrobot designs at this size scale which rely on flapping wings to produce lift. The robot has a wing span of 4 centimeters and weighs 133 milligrams. It spins its wings at 47 revolutions/second generating $>$ 138 milligrams of lift while consuming approximately 60 milliwatts of total power and operating at a low voltage ($<$ 3 V). Of the total power consumed 8.8 milliwatts is mechanical power generated, part of which goes towards spinning the wings, and 51 milliwatts is wasted in resistive Joule heating. With a lift-to-power ratio of 2.3 grams/W, its performance is at par with the best reported flapping wing devices at the insect-scale.
ROAug 9, 2019
New Wing Stroke and Wing Pitch Approaches for Milligram-scale Aerial DevicesPalak Bhushan, Claire Tomlin
Here we report the construction of the simplest transmission mechanism ever designed capable of converting linear motions of any actuator to $\pm$60$^\circ$ rotary wing stroke motion. It is planar, compliant, can be fabricated in a single step and requires no assembly. Further, its design is universal in nature, that is, it can be used with any linear actuator capable of delivering sufficient power, irrespective of the magnitude of actuator displacements. We also report a novel passive wing pitch mechanism whose motion has little dependence on the aerodynamic loading on the wing. This exponentially simplifies the job of the designer by decoupling the as of yet highly coupled wing morphology, wing kinematics and flexure stiffness parameters. Like the contemporary flexure-based methods it is an add-on to a given wing stroke mechanism. Moreover, the intended wing pitch amplitude could easily be changed post-fabrication by tuning the resonance mass in the mechanism.
ROAug 9, 2019
Design of the first sub-milligram flapping wing aerial vehiclePalak Bhushan, Claire Tomlin
Here we report the first sub-milligram flapping wing vehicle which is able to mimic insect wing kinematics. Wing stroke amplitude of 90$^\circ$ and wing pitch amplitude of 80$^\circ$ is demonstrated. This is also the smallest wing-span (single wing length of 3.5mm) device reported yet and is at the same mass-scale as a fruit fly. Assembly has been made simple and requires gluing together 5 components in contrast to higher part count and intensive assembly of other milligram-scale microrobots. This increases the fabrication speed and success-rate of the fully fabricated device. Low operational voltages (70mV) makes testing further easy and will enable eventual deployment of autonomous sub-milligram aerial vehicles.
ROAug 8, 2019
An Insect-scale Self-sufficient Rolling MicrorobotPalak Bhushan, Claire Tomlin
We design an insect-sized rolling microrobot driven by continuously rotating wheels. It measures 18mm$\times$8mm$\times$8mm. There are 2 versions of the robot - a 96mg laser-powered one and a 130mg supercapacitor powered one. The robot can move at 27mm/s (1.5 body lengths per second) with wheels rotating at 300$^\circ$/s, while consuming an average power of 2.5mW. Neither version has any electrical wires coming out of it, with the supercapacitor powered robot also being self-sufficient and is able to roll freely for 8 seconds after a single charge. Low-voltage electromagnetic actuators (1V-3V) along with a novel double-ratcheting mechanism enable the operation of this device. It is, to the best of our knowledge, the lightest and fastest self-sufficient rolling microrobot reported yet.
ROAug 8, 2019
An Insect-scale Untethered Laser-powered Jumping MicrorobotPalak Bhushan, Claire Tomlin
We present the design of an insect-sized jumping microrobot measuring 17mm$\times$6mm$\times$14mm and weighing 75 milligrams. The microrobot consumes 6.4mW of power to jump up by 8mm in height. The tethered version of the robot can jump 6 times per minute each time landing perfectly on its feet. The untethered version of the robot is powered using onboard photovoltaic cells illuminated by an external infrared laser source. It is, to the best of our knowledge, the lightest untethered jumping microrobot with onboard power source that has been reported yet.
ROMar 6, 2019
Combining Optimal Control and Learning for Visual Navigation in Novel EnvironmentsSomil Bansal, Varun Tolani, Saurabh Gupta et al.
Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories. However, it is challenging to use model-based methods in settings where the environment is a priori unknown and can only be observed partially through on-board sensors on the robot. In this work, we address this short-coming by coupling model-based control with learning-based perception. The learning-based perception module produces a series of waypoints that guide the robot to the goal via a collision-free path. These waypoints are used by a model-based planner to generate a smooth and dynamically feasible trajectory that is executed on the physical system using feedback control. Our experiments in simulated real-world cluttered environments and on an actual ground vehicle demonstrate that the proposed approach can reach goal locations more reliably and efficiently in novel environments as compared to purely geometric mapping-based or end-to-end learning-based alternatives. Our approach does not rely on detailed explicit 3D maps of the environment, works well with low frame rates, and generalizes well from simulation to the real world. Videos describing our approach and experiments are available on the project website.
SYFeb 20, 2019
Regression-based Inverter Control for Decentralized Optimal Power Flow and Voltage RegulationOscar Sondermeijer, Roel Dobbe, Daniel Arnold et al.
Electronic power inverters are capable of quickly delivering reactive power to maintain customer voltages within operating tolerances and to reduce system losses in distribution grids. This paper proposes a systematic and data-driven approach to determine reactive power inverter output as a function of local measurements in a manner that obtains near optimal results. First, we use a network model and historic load and generation data and do optimal power flow to compute globally optimal reactive power injections for all controllable inverters in the network. Subsequently, we use regression to find a function for each inverter that maps its local historical data to an approximation of its optimal reactive power injection. The resulting functions then serve as decentralized controllers in the participating inverters to predict the optimal injection based on a new local measurements. The method achieves near-optimal results when performing voltage- and capacity-constrained loss minimization and voltage flattening, and allows for an efficient volt-VAR optimization (VVO) scheme in which legacy control equipment collaborates with existing inverters to facilitate safe operation of distribution networks with higher levels of distributed generation.
LGFeb 19, 2019
Fast Neural Network Verification via Shadow PricesVicenc Rubies-Royo, Roberto Calandra, Dusan M. Stipanovic et al.
To use neural networks in safety-critical settings it is paramount to provide assurances on their runtime operation. Recent work on ReLU networks has sought to verify whether inputs belonging to a bounded box can ever yield some undesirable output. Input-splitting procedures, a particular type of verification mechanism, do so by recursively partitioning the input set into smaller sets. The efficiency of these methods is largely determined by the number of splits the box must undergo before the property can be verified. In this work, we propose a new technique based on shadow prices that fully exploits the information of the problem yielding a more efficient generation of splits than the state-of-the-art. Results on the Airborne Collision Avoidance System (ACAS) benchmark verification tasks show a considerable reduction in the partitions generated which substantially reduces computation times. These results open the door to improved verification methods for a wide variety of machine learning applications including vision and control.
LGSep 27, 2018
A Successive-Elimination Approach to Adaptive Robotic SensingEsther Rolf, David Fridovich-Keil, Max Simchowitz et al.
We study an adaptive source seeking problem, in which a mobile robot must identify the strongest emitter(s) of a signal in an environment with background emissions. Background signals may be highly heterogeneous and can mislead algorithms that are based on receding horizon control. We propose AdaSearch, a general algorithm for adaptive source seeking in the face of heterogeneous background noise. AdaSearch combines global trajectory planning with principled confidence intervals in order to concentrate measurements in promising regions while guaranteeing sufficient coverage of the entire area. Theoretical analysis shows that AdaSearch confers gains over a uniform sampling strategy when the distribution of background signals is highly variable. Simulation experiments demonstrate that when applied to the problem of radioactive source seeking, AdaSearch outperforms both uniform sampling and a receding time horizon information-maximization approach based on the current literature. We also demonstrate AdaSearch in hardware, providing further evidence of its potential for real-time implementation.
OCSep 17, 2018
The Parallelization of Riccati RecursionForrest Laine, Claire Tomlin
A method is presented for parallelizing the computation of solutions to discrete-time, linear-quadratic, finite-horizon optimal control problems, which we will refer to as LQR problems. This class of problem arises frequently in robotic trajectory optimization. For very complicated robots, the size of these resulting problems can be large enough that computing the solution is prohibitively slow when using a single processor. Fortunately, approaches to solving these type of problems based on numerical solutions to the KKT conditions of optimality offer a parallel solution method and can leverage multiple processors to compute solutions faster. However, these methods do not produce the useful feedback control policies that are generated as a by-product of the dynamic-programming solution method known as Riccati recursion. In this paper we derive a method which is able to parallelize the computation of Riccati recursion, allowing for super-fast solutions to the LQR problem while still generating feedback control policies. We demonstrate empirically that our method is faster than existing parallel methods.
SYSep 16, 2018
Efficient Computation of Feedback Control for Constrained SystemsForrest Laine, Claire Tomlin
A method is presented for solving the discrete-time finite-horizon Linear Quadratic Regulator (LQR) problem subject to auxiliary linear equality constraints, such as fixed end-point constraints. The method explicitly determines an affine relationship between the control and state variables, as in standard Riccati recursion, giving rise to feedback control policies that account for constraints. Since the linearly-constrained LQR problem arises commonly in robotic trajectory optimization, having a method that can efficiently compute these solutions is important. We demonstrate some of the useful properties and interpretations of said control policies, and we compare the computation time of our method against existing methods.
LGJun 14, 2018
Towards Distributed Energy Services: Decentralizing Optimal Power Flow with Machine LearningRoel Dobbe, Oscar Sondermeijer, David Fridovich-Keil et al.
The implementation of optimal power flow (OPF) methods to perform voltage and power flow regulation in electric networks is generally believed to require extensive communication. We consider distribution systems with multiple controllable Distributed Energy Resources (DERs) and present a data-driven approach to learn control policies for each DER to reconstruct and mimic the solution to a centralized OPF problem from solely locally available information. Collectively, all local controllers closely match the centralized OPF solution, providing near optimal performance and satisfaction of system constraints. A rate distortion framework enables the analysis of how well the resulting fully decentralized control policies are able to reconstruct the OPF solution. The methodology provides a natural extension to decide what nodes a DER should communicate with to improve the reconstruction of its individual policy. The method is applied on both single- and three-phase test feeder networks using data from real loads and distributed generators, focusing on DERs that do not exhibit inter-temporal dependencies. It provides a framework for Distribution System Operators to efficiently plan and operate the contributions of DERs to achieve Distributed Energy Services in distribution networks.
LGNov 5, 2017
On Identification of Distribution GridsOmid Ardakanian, Vincent W. S. Wong, Roel Dobbe et al.
Large-scale integration of distributed energy resources into residential distribution feeders necessitates careful control of their operation through power flow analysis. While the knowledge of the distribution system model is crucial for this type of analysis, it is often unavailable or outdated. The recent introduction of synchrophasor technology in low-voltage distribution grids has created an unprecedented opportunity to learn this model from high-precision, time-synchronized measurements of voltage and current phasors at various locations. This paper focuses on joint estimation of model parameters (admittance values) and operational structure of a poly-phase distribution network from the available telemetry data via the lasso, a method for regression shrinkage and selection. We propose tractable convex programs capable of tackling the low rank structure of the distribution system and develop an online algorithm for early detection and localization of critical events that induce a change in the admittance matrix. The efficacy of these techniques is corroborated through power flow studies on four three-phase radial distribution systems serving real household demands.
ITOct 24, 2017
A Sequential Approximation Framework for Coded Distributed OptimizationJingge Zhu, Ye Pu, Vipul Gupta et al.
Building on the previous work of Lee et al. and Ferdinand et al. on coded computation, we propose a sequential approximation framework for solving optimization problems in a distributed manner. In a distributed computation system, latency caused by individual processors ("stragglers") usually causes a significant delay in the overall process. The proposed method is powered by a sequential computation scheme, which is designed specifically for systems with stragglers. This scheme has the desirable property that the user is guaranteed to receive useful (approximate) computation results whenever a processor finishes its subtask, even in the presence of uncertain latency. In this paper, we give a coding theorem for sequentially computing matrix-vector multiplications, and the optimality of this coding scheme is also established. As an application of the results, we demonstrate solving optimization problems using a sequential approximation approach, which accelerates the algorithm in a distributed system with stragglers.
LGSep 10, 2017
MBMF: Model-Based Priors for Model-Free Reinforcement LearningSomil Bansal, Roberto Calandra, Kurtland Chua et al.
Reinforcement Learning is divided in two main paradigms: model-free and model-based. Each of these two paradigms has strengths and limitations, and has been successfully applied to real world domains that are appropriate to its corresponding strengths. In this paper, we present a new approach aimed at bridging the gap between these two paradigms. We aim to take the best of the two paradigms and combine them in an approach that is at the same time data-efficient and cost-savvy. We do so by learning a probabilistic dynamics model and leveraging it as a prior for the intertwined model-free optimization. As a result, our approach can exploit the generality and structure of the dynamics model, but is also capable of ignoring its inevitable inaccuracies, by directly incorporating the evidence provided by the direct observation of the cost. Preliminary results demonstrate that our approach outperforms purely model-based and model-free approaches, as well as the approach of simply switching from a model-based to a model-free setting.