SYOct 5, 2017
Collaborative Platooning of Automated Vehicles Using Variable Time-GapsAria HasanzadeZonuzy, Sina Arefizadeh, Alireza Talebpour et al.
Connected automated vehicles (CAVs) could potentially be coordinated to safely attain the maximum traffic flow on roadways under dynamic traffic patterns, such as those engendered by the merger of two strings of vehicles due a lane drop. Strings of vehicles have to be shaped correctly in terms of the inter-vehicular time-gap and velocity to ensure that such operation is feasible. However, controllers that can achieve such traffic shaping over the multiple dimensions of target time-gap and velocity over a region of space are unknown. The objective of this work is to design such a controller, and to show that we can design candidate time-gap and velocity profiles such that it can stabilize the string of vehicles in attaining the target profiles. Our analysis is based on studying the system in the spacial rather than the time domain, which enables us to study stability as in terms of minimizing errors from the target profile and across vehicles as a function of location. Finally, we conduct numeral simulations in the context of shaping two platoons for merger, which we use to illustrate how to select time-gap and velocity profiles for maximizing flow and maintaining safety.
SYFeb 1, 2018
Assessing Strong String Stability of Constant Spacing Policy under Speed Limit FluctuationsSina Arefizadeh, Aria Hasanzadezonuzy, Alireza Talebpour et al.
The speed limit changes frequently throughout the transportation network, due to either safety (e.g., change in geometry) or congestion management (e.g., speed harmonization systems). Any abrupt reduction in the speed limit can create a shockwave that propagates upstream in traffic. Dealing with such an abrupt reduction in speed limit is particularly important while designing control laws for a platoon of automated vehicles from both stability and efficiency perspectives. This paper focuses on Adaptive Cruise Control (ACC) based platooning under a constant spacing policy, and investigates the possibility of designing a controller that ensures stability, while tracking a given target velocity profile that changes as a function of location. An ideal controller should maintain a constant spacing between successive vehicles, while tracking the desired velocity profile. The analytical investigations of this paper suggest that such a controller does not exist.
LGDec 1, 2021
DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement LearningArchana Bura, Aria HasanzadeZonuzy, Dileep Kalathil et al.
Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations. We formulate this safe reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function, where we model the safety requirements as constraints on the expected cumulative costs that must be satisfied during all episodes of learning. We propose a model-based safe RL algorithm that we call Doubly Optimistic and Pessimistic Exploration (DOPE), and show that it achieves an objective regret $\tilde{O}(|\mathcal{S}|\sqrt{|\mathcal{A}| K})$ without violating the safety constraints during learning, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, and $K$ is the number of learning episodes. Our key idea is to combine a reward bonus for exploration (optimism) with a conservative constraint (pessimism), in addition to the standard optimistic model-based exploration. DOPE is not only able to improve the objective regret bound, but also shows a significant empirical performance improvement as compared to earlier optimism-pessimism approaches.
LGAug 1, 2020
Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPsAria HasanzadeZonuzy, Archana Bura, Dileep Kalathil et al.
Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy -- both objective maximization and constraint satisfaction -- in a PAC sense. We explore two classes of RL algorithms, namely, (i) a generative model based approach, wherein samples are taken initially to estimate a model, and (ii) an online approach, wherein the model is updated as samples are obtained. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample complexity of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints, which suggests that the approach may be easily utilized in real systems.