ROAILGMay 18, 2025

Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

arXiv:2505.13532v1h-index: 222025 IEEE Intelligent Vehicles Symposium (IV)
Originality Incremental advance
AI Analysis

This work addresses safety-critical constraints for autonomous driving systems, representing an incremental improvement over existing RL methods.

The paper tackles the challenge of handling safety constraints in reinforcement learning for autonomous driving by proposing a harmonic policy iteration technique that balances efficiency and safety gradients, integrated with DSAC to form DSAC-H, which achieves near-zero safety violations in multi-lane simulations.

Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios demonstrate that DSAC-H achieves efficient driving performance with near-zero safety constraint violations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes