Densely Constrained Depth Estimator for Monocular 3D Object Detection
This work addresses a key bottleneck in monocular 3D object detection for autonomous driving by improving depth estimation accuracy, though it is incremental as it builds on existing projection constraint methods.
The paper tackles the problem of inaccurate depth estimation in monocular 3D object detection by proposing a method that uses dense projection constraints from edges of any direction, achieving state-of-the-art performance on KITTI and WOD benchmarks.
Estimating accurate 3D locations of objects from monocular images is a challenging problem because of lacking depth. Previous work shows that utilizing the object's keypoint projection constraints to estimate multiple depth candidates boosts the detection performance. However, the existing methods can only utilize vertical edges as projection constraints for depth estimation. So these methods only use a small number of projection constraints and produce insufficient depth candidates, leading to inaccurate depth estimation. In this paper, we propose a method that utilizes dense projection constraints from edges of any direction. In this way, we employ much more projection constraints and produce considerable depth candidates. Besides, we present a graph matching weighting module to merge the depth candidates. The proposed method DCD (Densely Constrained Detector) achieves state-of-the-art performance on the KITTI and WOD benchmarks. Code is released at https://github.com/BraveGroup/DCD.