CVApr 6, 2022

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

arXiv:2204.03039v342 citationsh-index: 106Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for more accurate and efficient camera-based 3D detection, which is crucial for applications like autonomous driving, but it is incremental as it builds upon the prior DSGN method.

The authors tackled the problem of improving stereo-based 3D object detection by enhancing information flow from 2D to 3D, resulting in DSGN++ consistently outperforming other camera-based detectors on the KITTI benchmark across all categories.

Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We first revisit the prior stereo detector DSGN for its stereo volume construction ways for representing both 3D geometry and semantics. We polish the stereo modeling and propose the advanced version, DSGN++, aiming to enhance effective information flow throughout the 2D-to-3D pipeline in three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code is available at https://github.com/chenyilun95/DSGN2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes