Action-Based Representation Learning for Autonomous Driving
This work addresses the challenge of improving autonomous driving systems by leveraging human data more effectively, offering a novel method that enhances interpretability and reduces reliance on extensive supervision, though it appears incremental in its approach.
The paper tackles the problem of learning from human driving data for autonomous driving by proposing an action-based representation learning approach, which outperforms end-to-end models and previous methods using inverse dynamics or heavy supervision, achieving better interpretability and performance with less annotated data.
Human drivers produce a vast amount of data which could, in principle, be used to improve autonomous driving systems. Unfortunately, seemingly straightforward approaches for creating end-to-end driving models that map sensor data directly into driving actions are problematic in terms of interpretability, and typically have significant difficulty dealing with spurious correlations. Alternatively, we propose to use this kind of action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery and outperform pure end-to-end driving models, while being more interpretable. Further, we demonstrate how this strategy outperforms previous methods based on learning inverse dynamics models as well as other methods based on heavy human supervision (ImageNet).