CVMar 15, 2023

Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency

arXiv:2303.08686v128 citationsh-index: 87Has Code
Originality Incremental advance
AI Analysis

This addresses the high cost and inconsistency in data labeling for autonomous driving applications, though it is an incremental improvement over existing weakly supervised approaches.

The paper tackles the problem of monocular 3D object detection requiring expensive 3D labels for training by proposing a weakly supervised method that uses only 2D image labels, achieving comparable performance to fully supervised methods and significantly outperforming baselines with only one-third of 3D labels when used for pre-training.

Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application. A prominent advantage is that it does not need LiDAR point clouds during the inference. However, most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase. This inconsistency between the training and inference makes it hard to utilize the large-scale feedback data and increases the data collection expenses. To bridge this gap, we propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images. To be specific, we explore three types of consistency in this task, i.e. the projection, multi-view and direction consistency, and design a weakly-supervised architecture based on these consistencies. Moreover, we propose a new 2D direction labeling method in this task to guide the model for accurate rotation direction prediction. Experiments show that our weakly-supervised method achieves comparable performance with some fully supervised methods. When used as a pre-training method, our model can significantly outperform the corresponding fully-supervised baseline with only 1/3 3D labels. https://github.com/weakmono3d/weakmono3d

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes