AIJul 30, 2022
Unified Automatic Control of Vehicular Systems with Reinforcement LearningZhongxia Yan, Abdul Rahman Kreidieh, Eugene Vinitsky et al.
Emerging vehicular systems with increasing proportions of automated components present opportunities for optimal control to mitigate congestion and increase efficiency. There has been a recent interest in applying deep reinforcement learning (DRL) to these nonlinear dynamical systems for the automatic design of effective control strategies. Despite conceptual advantages of DRL being model-free, studies typically nonetheless rely on training setups that are painstakingly specialized to specific vehicular systems. This is a key challenge to efficient analysis of diverse vehicular and mobility systems. To this end, this article contributes a streamlined methodology for vehicular microsimulation and discovers high performance control strategies with minimal manual design. A variable-agent, multi-task approach is presented for optimization of vehicular Partially Observed Markov Decision Processes. The methodology is experimentally validated on mixed autonomy traffic systems, where fractions of vehicles are automated; empirical improvement, typically 15-60% over a human driving baseline, is observed in all configurations of six diverse open or closed traffic systems. The study reveals numerous emergent behaviors resembling wave mitigation, traffic signaling, and ramp metering. Finally, the emergent behaviors are analyzed to produce interpretable control strategies, which are validated against the learned control strategies.
SYJun 22, 2020
Zero-Shot Autonomous Vehicle Policy Transfer: From Simulation to Real-World via Adversarial LearningBehdad Chalaki, Logan E. Beaver, Ben Remer et al.
In this article, we demonstrate a zero-shot transfer of an autonomous driving policy from simulation to University of Delaware's scaled smart city with adversarial multi-agent reinforcement learning, in which an adversary attempts to decrease the net reward by perturbing both the inputs and outputs of the autonomous vehicles during training. We train the autonomous vehicles to coordinate with each other while crossing a roundabout in the presence of an adversary in simulation. The adversarial policy successfully reproduces the simulated behavior and incidentally outperforms, in terms of travel time, both a human-driving baseline and adversary-free trained policies. Finally, we demonstrate that the addition of adversarial training considerably improves the performance \eat{stability and robustness} of the policies after transfer to the real world compared to Gaussian noise injection.
ROJun 28, 2022
Learning energy-efficient driving behaviors by imitating expertsAbdul Rahman Kreidieh, Zhe Fu, Alexandre M. Bayen
The rise of vehicle automation has generated significant interest in the potential role of future automated vehicles (AVs). In particular, in highly dense traffic settings, AVs are expected to serve as congestion-dampeners, mitigating the presence of instabilities that arise from various sources. However, in many applications, such maneuvers rely heavily on non-local sensing or coordination by interacting AVs, thereby rendering their adaptation to real-world settings a particularly difficult challenge. To address this challenge, this paper examines the role of imitation learning in bridging the gap between such control strategies and realistic limitations in communication and sensing. Treating one such controller as an "expert", we demonstrate that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations. Results and code are available online at https://sites.google.com/view/il-traffic/home.
LGMay 29, 2025
(U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEsNathan Lichtlé, Alexi Canesse, Zhe Fu et al.
We introduce (U)NFV, a modular neural network architecture that generalizes classical finite volume (FV) methods for solving hyperbolic conservation laws. Hyperbolic partial differential equations (PDEs) are challenging to solve, particularly conservation laws whose physically relevant solutions contain shocks and discontinuities. FV methods are widely used for their mathematical properties: convergence to entropy solutions, flow conservation, or total variation diminishing, but often lack accuracy and flexibility in complex settings. Neural Finite Volume addresses these limitations by learning update rules over extended spatial and temporal stencils while preserving conservation structure. It supports both supervised training on solution data (NFV) and unsupervised training via weak-form residual loss (UNFV). Applied to first-order conservation laws, (U)NFV achieves up to 10x lower error than Godunov's method, outperforms ENO/WENO, and rivals discontinuous Galerkin solvers with far less complexity. On traffic modeling problems, both from PDEs and from experimental highway data, (U)NFV captures nonlinear wave dynamics with significantly higher fidelity and scalability than traditional FV approaches.
SYJan 18, 2024
Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory DataNathan Lichtlé, Kathy Jang, Adit Shah et al.
Designing traffic-smoothing cruise controllers that can be deployed onto autonomous vehicles is a key step towards improving traffic flow, reducing congestion, and enhancing fuel efficiency in mixed autonomy traffic. We bypass the common issue of having to carefully fine-tune a large traffic microsimulator by leveraging real-world trajectory data from the I-24 highway in Tennessee, replayed in a one-lane simulation. Using standard deep reinforcement learning methods, we train energy-reducing wave-smoothing policies. As an input to the agent, we observe the speed and distance of only the vehicle in front, which are local states readily available on most recent vehicles, as well as non-local observations about the downstream state of the traffic. We show that at a low 4% autonomous vehicle penetration rate, we achieve significant fuel savings of over 15% on trajectories exhibiting many stop-and-go waves. Finally, we analyze the smoothing effect of the controllers and demonstrate robustness to adding lane-changing into the simulation as well as the removal of downstream information.
SYDec 29, 2021
Reachability Analysis for FollowerStopper: Safety Analysis and Experimental ResultsFang-Chieh Chou, Marsalis Gibson, Rahul Bhadani et al.
Motivated by earlier work and the developer of a new algorithm, the FollowerStopper, this article uses reachability analysis to verify the safety of the FollowerStopper algorithm, which is a controller designed for dampening stop- and-go traffic waves. With more than 1100 miles of driving data collected by our physical platform, we validate our analysis results by comparing it to human driving behaviors. The FollowerStopper controller has been demonstrated to dampen stop-and-go traffic waves at low speed, but previous analysis on its relative safety has been limited to upper and lower bounds of acceleration. To expand upon previous analysis, reachability analysis is used to investigate the safety at the speeds it was originally tested and also at higher speeds. Two formulations of safety analysis with different criteria are shown: distance-based and time headway-based. The FollowerStopper is considered safe with distance-based criterion. However, simulation results demonstrate that the FollowerStopper is not representative of human drivers - it follows too closely behind vehicles, specifically at a distance human would deem as unsafe. On the other hand, under the time headway-based safety analysis, the FollowerStopper is not considered safe anymore. A modified FollowerStopper is proposed to satisfy time-based safety criterion. Simulation results of the proposed FollowerStopper shows that its response represents human driver behavior better.
LGFeb 18, 2020
ResiliNet: Failure-Resilient Inference in Distributed Neural NetworksAshkan Yousefpour, Brian Q. Nguyen, Siddartha Devic et al.
Federated Learning aims to train distributed deep models without sharing the raw data with the centralized server. Similarly, in distributed inference of neural networks, by partitioning the network and distributing it across several physical nodes, activations and gradients are exchanged between physical nodes, rather than raw data. Nevertheless, when a neural network is partitioned and distributed among physical nodes, failure of physical nodes causes the failure of the neural units that are placed on those nodes, which results in a significant performance drop. Current approaches focus on resiliency of training in distributed neural networks. However, resiliency of inference in distributed neural networks is less explored. We introduce ResiliNet, a scheme for making inference in distributed neural networks resilient to physical node failures. ResiliNet combines two concepts to provide resiliency: skip hyperconnection, a concept for skipping nodes in distributed neural networks similar to skip connection in resnets, and a novel technique called failout, which is introduced in this paper. Failout simulates physical node failure conditions during training using dropout, and is specifically designed to improve the resiliency of distributed neural networks. The results of the experiments and ablation studies using three datasets confirm the ability of ResiliNet to provide inference resiliency for distributed neural networks.
LGDec 5, 2019
Inter-Level Cooperation in Hierarchical Reinforcement LearningAbdul Rahman Kreidieh, Glen Berseth, Brandon Trabucco et al.
Hierarchies of temporally decoupled policies present a promising approach for enabling structured exploration in complex long-term planning problems. To fully achieve this approach an end-to-end training paradigm is needed. However, training these multi-level policies has had limited success due to challenges arising from interactions between the goal-assigning and goal-achieving levels within a hierarchy. In this article, we consider the policy optimization process as a multi-agent process. This allows us to draw on connections between communication and cooperation in multi-agent RL, and demonstrate the benefits of increased cooperation between sub-policies on the training performance of the overall policy. We introduce a simple yet effective technique for inducing inter-level cooperation by modifying the objective function and subsequent gradients of higher-level policies. Experimental results on a wide variety of simulated robotics and traffic control tasks demonstrate that inducing cooperation results in stronger performing policies and increased sample efficiency on a set of difficult long time horizon tasks. We also find that goal-conditioned policies trained using our method display better transfer to new tasks, highlighting the benefits of our method in learning task-agnostic lower-level behaviors. Videos and code are available at: https://sites.google.com/berkeley.edu/cooperative-hrl.
NISep 3, 2019
Guardians of the Deep Fog: Failure-Resilient DNN Inference from Edge to CloudAshkan Yousefpour, Siddartha Devic, Brian Q. Nguyen et al.
Partitioning and distributing deep neural networks (DNNs) over physical nodes such as edge, fog, or cloud nodes, could enhance sensor fusion, and reduce bandwidth and inference latency. However, when a DNN is distributed over physical nodes, failure of the physical nodes causes the failure of the DNN units that are placed on these nodes. The performance of the inference task will be unpredictable, and most likely, poor, if the distributed DNN is not specifically designed and properly trained for failures. Motivated by this, we introduce deepFogGuard, a DNN architecture augmentation scheme for making the distributed DNN inference task failure-resilient. To articulate deepFogGuard, we introduce the elements and a model for the resiliency of distributed DNN inference. Inspired by the concept of residual connections in DNNs, we introduce skip hyperconnections in distributed DNNs, which are the basis of deepFogGuard's design to provide resiliency. Next, our extensive experiments using two existing datasets for the sensing and vision applications confirm the ability of deepFogGuard to provide resiliency for distributed DNNs in edge-cloud networks.
SYJul 24, 2017
Integration of Information Patterns in the Modeling and Design of Mobility Management ServicesAlexander Keimer, Nicolas Laurent-Brouty, Farhad Farokhi et al.
Over the last decade, the rise of the mobile internet and the usage of mobile devices has enabled ubiquitous traffic information. With the increased adoption of specific smartphone applications, the number of users of routing applications has become large enough to disrupt traffic flow patterns in a significant manner. Similarly, but at a slightly slower pace, novel services for freight transportation and city logistics improve the efficiency of goods transportation and change the use of road infrastructure. The present article provides a general four-layer framework for modeling these new trends. The main motivation behind the development is to provide a unifying formal system description that can at the same time encompass system physics (flow and motion of vehicles) as well as coordination strategies under various information and cooperation structures. To showcase the framework, we apply it to the specific challenge of modeling and analyzing the integration of routing applications in today's transportation systems. In this framework, at the lowest layer (flow dynamics) we distinguish app users from non-app users. A distributed parameter model based on a non-local partial differential equation is introduced and analyzed. The second layer incorporates connected services (e.g., routing) and other applications used to optimize the local performance of the system. As inputs to those applications, we propose a third layer introducing the incentive design and global objectives, which are typically varying over the day depending on road and weather conditions, external events etc. The high-level planning is handled on the fourth layer taking social long-term objectives into account.
AIJan 30, 2017
Expert Level control of Ramp Metering based on Multi-task Deep Reinforcement LearningFrancois Belletti, Daniel Haziza, Gabriel Gomes et al.
This article shows how the recent breakthroughs in Reinforcement Learning (RL) that have enabled robots to learn to play arcade video games, walk or assemble colored bricks, can be used to perform other tasks that are currently at the core of engineering cyberphysical systems. We present the first use of RL for the control of systems modeled by discretized non-linear Partial Differential Equations (PDEs) and devise a novel algorithm to use non-parametric control techniques for large multi-agent systems. We show how neural network based RL enables the control of discretized PDEs whose parameters are unknown, random, and time-varying. We introduce an algorithm of Mutual Weight Regularization (MWR) which alleviates the curse of dimensionality of multi-agent control schemes by sharing experience between agents while giving each agent the opportunity to specialize its action policy so as to tailor it to the local parameters of the part of the system it is located in.
LGMar 10, 2016
Scalable Linear Causal Inference for Irregularly Sampled Time Series with Long Range DependenciesFrancois W. Belletti, Evan R. Sparks, Michael J. Franklin et al.
Linear causal analysis is central to a wide range of important application spanning finance, the physical sciences, and engineering. Much of the existing literature in linear causal analysis operates in the time domain. Unfortunately, the direct application of time domain linear causal analysis to many real-world time series presents three critical challenges: irregular temporal sampling, long range dependencies, and scale. Moreover, real-world data is often collected at irregular time intervals across vast arrays of decentralized sensors and with long range dependencies which make naive time domain correlation estimators spurious. In this paper we present a frequency domain based estimation framework which naturally handles irregularly sampled data and long range dependencies while enabled memory and communication efficient distributed processing of time series data. By operating in the frequency domain we eliminate the need to interpolate and help mitigate the effects of long range dependencies. We implement and evaluate our new work-flow in the distributed setting using Apache Spark and demonstrate on both Monte Carlo simulations and high-frequency financial trading that we can accurately recover causal structure at scale.
CRJan 15, 2016
Differential Privacy of Populations in Routing GamesRoy Dong, Walid Krichene, Alexandre M. Bayen et al.
As our ground transportation infrastructure modernizes, the large amount of data being measured, transmitted, and stored motivates an analysis of the privacy aspect of these emerging cyber-physical technologies. In this paper, we consider privacy in the routing game, where the origins and destinations of drivers are considered private. This is motivated by the fact that this spatiotemporal information can easily be used as the basis for inferences for a person's activities. More specifically, we consider the differential privacy of the mapping from the amount of flow for each origin-destination pair to the traffic flow measurements on each link of a traffic network. We use a stochastic online learning framework for the population dynamics, which is known to converge to the Nash equilibrium of the routing game. We analyze the sensitivity of this process and provide theoretical guarantees on the convergence rates as well as differential privacy values for these models. We confirm these with simulations on a small example.
LGJul 31, 2014
Learning Nash Equilibria in Congestion GamesWalid Krichene, Benjamin Drighès, Alexandre M. Bayen
We study the repeated congestion game, in which multiple populations of players share resources, and make, at each iteration, a decentralized decision on which resources to utilize. We investigate the following question: given a model of how individual players update their strategies, does the resulting dynamics of strategy profiles converge to the set of Nash equilibria of the one-shot game? We consider in particular a model in which players update their strategies using algorithms with sublinear discounted regret. We show that the resulting sequence of strategy profiles converges to the set of Nash equilibria in the sense of Cesàro means. However, strong convergence is not guaranteed in general. We show that strong convergence can be guaranteed for a class of algorithms with a vanishing upper bound on discounted regret, and which satisfy an additional condition. We call such algorithms AREP algorithms, for Approximate REPlicator, as they can be interpreted as a discrete-time approximation of the replicator equation, which models the continuous-time evolution of population strategies, and which is known to converge for the class of congestion games. In particular, we show that the discounted Hedge algorithm belongs to the AREP class, which guarantees its strong convergence.
HCJun 22, 2014
Environmental Sensing by Wearable Device for Indoor Activity and Location EstimationMing Jin, Han Zou, Kevin Weekly et al.
We present results from a set of experiments in this pilot study to investigate the causal influence of user activity on various environmental parameters monitored by occupant carried multi-purpose sensors. Hypotheses with respect to each type of measurements are verified, including temperature, humidity, and light level collected during eight typical activities: sitting in lab / cubicle, indoor walking / running, resting after physical activity, climbing stairs, taking elevators, and outdoor walking. Our main contribution is the development of features for activity and location recognition based on environmental measurements, which exploit location- and activity-specific characteristics and capture the trends resulted from the underlying physiological process. The features are statistically shown to have good separability and are also information-rich. Fusing environmental sensing together with acceleration is shown to achieve classification accuracy as high as 99.13%. For building applications, this study motivates a sensor fusion paradigm for learning individualized activity, location, and environmental preferences for energy management and user comfort.