CVMay 29, 2023

View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari, Francesca Odone

arXiv:2305.17972v11.5

Originality Incremental advance

AI Analysis

This addresses the need for cost-effective and widely applicable perception in autonomous driving by enabling self-supervised training without expensive sensors, though it is incremental as it builds on existing self-supervised approaches.

The paper tackles the problem of reducing annotation costs for monocular 3D object detection in autonomous vehicles by proposing a self-supervised method that uses only RGB sequences and multi-view constraints, achieving performance comparable to state-of-the-art self-supervised methods that rely on LIDAR or stereo images.

For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in 3D space, hence the task of 3D object detection represents a fundamental aspect of perception. While 3D sensors deliver accurate metric perception, monocular approaches enjoy cost and availability advantages that are valuable in a wide range of applications. Unfortunately, training monocular methods requires a vast amount of annotated data. Interestingly, self-supervised approaches have recently been successfully applied to ease the training process and unlock access to widely available unlabelled data. While related research leverages different priors including LIDAR scans and stereo images, such priors again limit usability. Therefore, in this work, we propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone, leveraging multi-view constraints and weak labels. Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods using LIDAR scans or stereo images.

View on arXiv PDF

Similar