CVROAug 18, 2021

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

arXiv:2108.08265v3318 citations
Originality Highly original
AI Analysis

This work addresses the challenge of generating high-quality supervision for end-to-end driving algorithms, which is crucial for improving autonomous vehicle performance in urban environments.

The paper tackled the problem of training end-to-end autonomous driving agents by introducing a reinforcement learning expert that provides better supervision than human demonstrations or rule-based automated experts, resulting in a 78% success rate on the NoCrash-dense benchmark and state-of-the-art performance on the CARLA LeaderBoard.

End-to-end approaches to autonomous driving commonly rely on expert demonstrations. Although humans are good drivers, they are not good coaches for end-to-end algorithms that demand dense on-policy supervision. On the contrary, automated experts that leverage privileged information can efficiently generate large scale on-policy and off-policy demonstrations. However, existing automated experts for urban driving make heavy use of hand-crafted rules and perform suboptimally even on driving simulators, where ground-truth information is available. To address these issues, we train a reinforcement learning expert that maps bird's-eye view images to continuous low-level actions. While setting a new performance upper-bound on CARLA, our expert is also a better coach that provides informative supervision signals for imitation learning agents to learn from. Supervised by our reinforcement learning coach, a baseline end-to-end agent with monocular camera-input achieves expert-level performance. Our end-to-end agent achieves a 78% success rate while generalizing to a new town and new weather on the NoCrash-dense benchmark and state-of-the-art performance on the challenging public routes of the CARLA LeaderBoard.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes