Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement
This work addresses safety and exploration challenges in offline RL for autonomous driving, offering an incremental improvement over existing methods.
The paper tackled the problem of conservative policies and limited exploration in offline reinforcement learning for autonomous driving by enhancing BCQ with learnable parameter noise and a Lyapunov-based safety strategy, resulting in outperformance over conventional RL and state-of-the-art offline RL algorithms in highway and parking scenarios.
Reinforcement learning (RL) is a powerful data-driven control method that has been largely explored in autonomous driving tasks. However, conventional RL approaches learn control policies through trial-and-error interactions with the environment and therefore may cause disastrous consequences such as collisions when testing in real-world traffic. Offline RL has recently emerged as a promising framework to learn effective policies from previously-collected, static datasets without the requirement of active interactions, making it especially appealing for autonomous driving applications. Despite promising, existing offline RL algorithms such as Batch-Constrained deep Q-learning (BCQ) generally lead to rather conservative policies with limited exploration efficiency. To address such issues, this paper presents an enhanced BCQ algorithm by employing a learnable parameter noise scheme in the perturbation model to increase the diversity of observed actions. In addition, a Lyapunov-based safety enhancement strategy is incorporated to constrain the explorable state space within a safe region. Experimental results in highway and parking traffic scenarios show that our approach outperforms the conventional RL method, as well as state-of-the-art offline RL algorithms.