CVOct 7, 2020

Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels

L. Koestler, N. Yang, R. Wang, D. Cremers

arXiv:2010.03506v17.210 citations

Originality Incremental advance

AI Analysis

This reduces labeling costs for autonomous driving systems, but it is incremental as it builds on existing rendering and pre-trained network techniques.

The paper tackles the problem of training 3D object detectors without costly 3D bounding box labels by using triangular meshes and differentiable rendering with losses based on depth, segmentation, and motion from pre-trained networks. It achieves promising performance on the KITTI dataset compared to state-of-the-art methods that require 3D labels.

The training of deep-learning-based 3D object detectors requires large datasets with 3D bounding box labels for supervision that have to be generated by hand-labeling. We propose a network architecture and training procedure for learning monocular 3D object detection without 3D bounding box labels. By representing the objects as triangular meshes and employing differentiable shape rendering, we define loss functions based on depth maps, segmentation masks, and ego- and object-motion, which are generated by pre-trained, off-the-shelf networks. We evaluate the proposed algorithm on the real-world KITTI dataset and achieve promising performance in comparison to state-of-the-art methods requiring 3D bounding box labels for training and superior performance to conventional baseline methods.

View on arXiv PDF

Similar