Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving
This work addresses the challenge of unified behavior and control for autonomous driving systems, representing an incremental improvement over existing RL methods by incorporating hierarchical policy design.
The paper tackles the problem of fluctuating driving behavior and suboptimal control in RL-based autonomous driving by proposing a multi-timescale hierarchical RL approach, which significantly improves driving efficiency, action consistency, and safety in simulated highway scenarios.
Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.