CV AI ROApr 21, 2022

SelfD: Self-Learning Large-Scale Driving Policies From the Web

Jimuyang Zhang, Ruizhao Zhu, Eshed Ohn-Bar

arXiv:2204.10320v113.224 citationsh-index: 31

Originality Highly original

AI Analysis

This work addresses the challenge of leveraging diverse, unlabeled internet data for complex 3D reasoning and planning in autonomous driving, offering a scalable solution for robust navigation across varied scenarios.

The paper tackles the problem of learning scalable driving policies from large amounts of unlabeled online monocular images, achieving consistent improvements of up to 24% in driving performance across multiple benchmarks without additional data collection or annotation.

Effectively utilizing the vast amounts of ego-centric navigation data that is freely available on the internet can advance generalized intelligent systems, i.e., to robustly scale across perspectives, platforms, environmental conditions, scenarios, and geographical locations. However, it is difficult to directly leverage such large amounts of unlabeled and highly diverse data for complex 3D reasoning and planning tasks. Consequently, researchers have primarily focused on its use for various auxiliary pixel- and image-level computer vision tasks that do not consider an ultimate navigational objective. In this work, we introduce SelfD, a framework for learning scalable driving by utilizing large amounts of online monocular images. Our key idea is to leverage iterative semi-supervised training when learning imitative agents from unlabeled data. To handle unconstrained viewpoints, scenes, and camera parameters, we train an image-based model that directly learns to plan in the Bird's Eye View (BEV) space. Next, we use unlabeled data to augment the decision-making knowledge and robustness of an initially trained model via self-training. In particular, we propose a pseudo-labeling step which enables making full use of highly diverse demonstration data through "hypothetical" planning-based data augmentation. We employ a large dataset of publicly available YouTube videos to train SelfD and comprehensively analyze its generalization benefits across challenging navigation scenarios. Without requiring any additional data collection or annotation efforts, SelfD demonstrates consistent improvements (by up to 24%) in driving performance evaluation on nuScenes, Argoverse, Waymo, and CARLA.

View on arXiv PDF

Similar