ROJul 23, 2024
SECRM-2D: RL-Based Efficient and Comfortable Route-Following Autonomous Driving with Analytic Safety GuaranteesTianyu Shi, Ilia Smirnov, Omar ElSamadisy et al.
Over the last decade, there has been increasing interest in autonomous driving systems. Reinforcement Learning (RL) shows great promise for training autonomous driving controllers, being able to directly optimize a combination of criteria such as efficiency comfort, and stability. However, RL- based controllers typically offer no safety guarantees, making their readiness for real deployment questionable. In this paper, we propose SECRM-2D (the Safe, Efficient and Comfortable RL- based driving Model with Lane-Changing), an RL autonomous driving controller (both longitudinal and lateral) that balances optimization of efficiency and comfort and follows a fixed route, while being subject to hard analytic safety constraints. The aforementioned safety constraints are derived from the criterion that the follower vehicle must have sufficient headway to be able to avoid a crash if the leader vehicle brakes suddenly. We evaluate SECRM-2D against several learning and non-learning baselines in simulated test scenarios, including freeway driving, exiting, merging, and emergency braking. Our results confirm that representative previously-published RL AV controllers may crash in both training and testing, even if they are optimizing a safety objective. By contrast, our controller SECRM-2D is successful in avoiding crashes during both training and testing, improves over the baselines in measures of efficiency and comfort, and is more faithful in following the prescribed route. In addition, we achieve a good theoretical understanding of the longitudinal steady-state of a collection of SECRM-2D vehicles.
LGSep 1, 2024
Generalized Multi-hop Traffic Pressure for Heterogeneous Traffic Perimeter ControlXiaocan Li, Xiaoyu Wang, Ilia Smirnov et al.
Perimeter control (PC) prevents loss of traffic network capacity due to congestion in urban areas. Homogeneous PC allows all access points to a protected region to have identical permitted inflow. However, homogeneous PC performs poorly when the congestion in the protected region is heterogeneous (e.g., imbalanced demand) since the homogeneous PC does not consider specific traffic conditions around each perimeter intersection. When the protected region has spatially heterogeneous congestion, one needs to modulate the perimeter inflow rate to be higher near low-density regions and vice versa for high-density regions. A naïve approach is to leverage 1-hop traffic pressure to measure traffic condition around perimeter intersections, but such metric is too spatially myopic for PC. To address this issue, we formulate multi-hop downstream pressure grounded on Markov chain theory, which ``looks deeper'' into the protected region beyond perimeter intersections. In addition, we formulate a two-stage hierarchical control scheme that can leverage this novel multi-hop pressure to redistribute the total permitted inflow provided by a pre-trained deep reinforcement learning homogeneous control policy. Experimental results show that our heterogeneous PC approaches leveraging multi-hop pressure significantly outperform homogeneous PC in scenarios where the origin-destination flows are highly imbalanced with high spatial heterogeneity. Moveover, our approach is shown to be robust against turning ratio uncertainties by a sensitivity analysis.
LGNov 10, 2024
Multi-hop Upstream Anticipatory Traffic Signal Control with Deep Reinforcement LearningXiaocan Li, Xiaoyu Wang, Ilia Smirnov et al.
Coordination in traffic signal control is crucial for managing congestion in urban networks. Existing pressure-based control methods focus only on immediate upstream links, leading to suboptimal green time allocation and increased network delays. However, effective signal control inherently requires coordination across a broader spatial scope, as the effect of upstream traffic should influence signal control decisions at downstream intersections, impacting a large area in the traffic network. Although agent communication using neural network-based feature extraction can implicitly enhance spatial awareness, it significantly increases the learning complexity, adding an additional layer of difficulty to the challenging task of control in deep reinforcement learning. To address the issue of learning complexity and myopic traffic pressure definition, our work introduces a novel concept based on Markov chain theory, namely \textit{multi-hop upstream pressure}, which generalizes the conventional pressure to account for traffic conditions beyond the immediate upstream links. This farsighted and compact metric informs the deep reinforcement learning agent to preemptively clear the multi-hop upstream queues, guiding the agent to optimize signal timings with a broader spatial awareness. Simulations on synthetic and realistic (Toronto) scenarios demonstrate controllers utilizing multi-hop upstream pressure significantly reduce overall network delay by prioritizing traffic movements based on a broader understanding of upstream congestion.