ROSep 3, 2021

Iterative Imitation Policy Improvement for Interactive Autonomous Driving

arXiv:2109.01288v1
Originality Incremental advance
AI Analysis

This work addresses the problem of safe and efficient policy improvement for autonomous driving systems, though it is incremental as it builds on existing imitation learning and data aggregation methods.

The paper tackles the challenge of improving imitation learning policies for autonomous driving in urban traffic by proposing a system that uses a weak simulator for safe online data collection and an A* planner as a pseudo-expert to reduce human labeling. The results show significant performance improvement over a baseline Behavioral Cloning policy.

We propose an imitation learning system for autonomous driving in urban traffic with interactions. We train a Behavioral Cloning~(BC) policy to imitate driving behavior collected from the real urban traffic, and apply the data aggregation algorithm to improve its performance iteratively. Applying data aggregation in this setting comes with two challenges. The first challenge is that it is expensive and dangerous to collect online rollout data in the real urban traffic. Creating similar traffic scenarios in simulator like CARLA for online rollout collection can also be difficult. Instead, we propose to create a weak simulator from the training dataset, in which all the surrounding vehicles follow the data trajectory provided by the dataset. We find that the collected online data in such a simulator can still be used to improve BC policy's performance. The second challenge is the tedious and time-consuming process of human labelling process during online rollout. To solve this problem, we use an A$^*$ planner as a pseudo-expert to provide expert-like demonstration. We validate our proposed imitation learning system in the real urban traffic scenarios. The experimental results show that our system can significantly improve the performance of baseline BC policy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes