Goal-constrained Sparse Reinforcement Learning for End-to-End Driving
This work addresses the problem of reward engineering in autonomous driving for researchers and practitioners, though it appears incremental as it builds on existing sparse reward and curriculum learning methods.
The paper tackles the challenge of deep reinforcement learning for end-to-end driving by using goal-constrained sparse rewards to avoid complex reward engineering, and it demonstrates generalization to unseen road layouts and significantly longer driving distances than in training.
Deep reinforcement Learning for end-to-end driving is limited by the need of complex reward engineering. Sparse rewards can circumvent this challenge but suffers from long training time and leads to sub-optimal policy. In this work, we explore full-control driving with only goal-constrained sparse reward and propose a curriculum learning approach for end-to-end driving using only navigation view maps that benefit from small virtual-to-real domain gap. To address the complexity of multiple driving policies, we learn concurrent individual policies selected at inference by a navigation system. We demonstrate the ability of our proposal to generalize on unseen road layout, and to drive significantly longer than in the training.