CVLGROJul 15, 2024

DINO Pre-training for Vision-based End-to-end Autonomous Driving

arXiv:2407.10803v13 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing visual understanding for autonomous driving systems, offering an incremental improvement over existing pre-training approaches.

The paper tackled the problem of pre-training visual encoders for autonomous driving agents by proposing DINO-based self-supervised pre-training instead of classification-based methods, resulting in improved efficiency and performance comparable to visual place recognition pre-training in CARLA benchmark tests.

In this article, we focus on the pre-training of visual autonomous driving agents in the context of imitation learning. Current methods often rely on a classification-based pre-training, which we hypothesise to be holding back from extending capabilities of implicit image understanding. We propose pre-training the visual encoder of a driving agent using the self-distillation with no labels (DINO) method, which relies on a self-supervised learning paradigm.% and is trained on an unrelated task. Our experiments in CARLA environment in accordance with the Leaderboard benchmark reveal that the proposed pre-training is more efficient than classification-based pre-training, and is on par with the recently proposed pre-training based on visual place recognition (VPRPre).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes