CVLGSep 30, 2020

Monocular Differentiable Rendering for Self-Supervised 3D Object Detection

arXiv:2009.14524v145 citations
Originality Incremental advance
AI Analysis

This addresses the problem of ambiguous depth and scale in monocular 3D detection for autonomous driving applications, offering a cost-effective alternative to supervised methods.

The paper tackles 3D object detection from monocular images by proposing a self-supervised method using differentiable rendering and shape priors to reconstruct 3D shapes and estimate poses, achieving results that effectively replace expensive 3D labels or LiDAR data on the KITTI dataset.

3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale. To overcome this ambiguity, we present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks. Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective derived from a pretrained monocular depth estimation network. We use the KITTI 3D object detection dataset to evaluate the accuracy of the method. Experiments demonstrate that we can effectively use noisy monocular depth and differentiable rendering as an alternative to expensive 3D ground-truth labels or LiDAR information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes