ROAIJul 15, 2021

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

arXiv:2107.07316v131 citations
Originality Incremental advance
AI Analysis

This work addresses safety-critical autonomous driving by reducing conservative behavior while maintaining guarantees, though it is incremental in applying distributional RL to a known bottleneck.

The paper tackles the challenge of balancing safety and comfort in autonomous driving by proposing a distributional reinforcement learning framework that adapts conservativity at run-time, resulting in policies that drive 8 seconds faster on average than DQN and reduce safety interference by 83% compared to rule-based methods.

Despite recent advances in reinforcement learning (RL), its application in safety critical domains like autonomous vehicles is still challenging. Although punishing RL agents for risky situations can help to learn safe policies, it may also lead to highly conservative behavior. In this paper, we propose a distributional RL framework in order to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility. Using a proactive safety verification approach, the proposed framework can guarantee that actions generated from RL are fail-safe according to the worst-case assumptions. Concurrently, the policy is encouraged to minimize safety interference and generate more comfortable behavior. We trained and evaluated the proposed approach and baseline policies using a high level simulator with a variety of randomized scenarios including several corner cases which rarely happen in reality but are very crucial. In light of our experiments, the behavior of policies learned using distributional RL can be adaptive at run-time and robust to the environment uncertainty. Quantitatively, the learned distributional RL agent drives in average 8 seconds faster than the normal DQN policy and requires 83\% less safety interference compared to the rule-based policy with slightly increasing the average crossing time. We also study sensitivity of the learned policy in environments with higher perception noise and show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes