CVJan 10, 2020

DSGN: Deep Stereo Geometry Network for 3D Object Detection

arXiv:2001.03398v3227 citationsHas Code
AI Analysis

This work addresses the problem of accurate 3D object detection for autonomous driving by reducing reliance on expensive LiDAR sensors, though it is incremental as it builds on existing stereo-based approaches.

The paper tackles the performance gap between image-based and LiDAR-based 3D object detection by introducing DSGN, a one-stage stereo-based method that uses a differentiable volumetric representation to jointly learn depth and detect objects, achieving about 10% higher AP than previous stereo-based methods and comparable performance to some LiDAR-based methods on KITTI.

Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors because there is a large performance gap between image-based and LiDAR-based methods. It is caused by the way to form representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap by detecting 3D objects on a differentiable volumetric representation -- 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with several LiDAR-based methods on the KITTI 3D object detection leaderboard. Our code is publicly available at https://github.com/chenyilun95/DSGN.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes