Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding
This work addresses the challenge of monocular 3D object detection for autonomous driving, achieving state-of-the-art speed-accuracy trade-off, but it is incremental as it builds on existing center-based and depth estimation techniques.
The paper tackled the problem of localizing objects in 3D space from monocular RGB images by proposing Center3D, a one-stage anchor-free method that improved average precision (AP) in bird's-eye view from 29.7% to 42.8% and in 3D from 18.6% to 39.1% on the KITTI dataset.
Localizing objects in 3D space and understanding their associated 3D properties is challenging given only monocular RGB images. The situation is compounded by the loss of depth information during perspective projection. We present Center3D, a one-stage anchor-free approach, to efficiently estimate 3D location and depth using only monocular RGB images. By exploiting the difference between 2D and 3D centers, we are able to estimate depth consistently. Center3D uses a combination of classification and regression to understand the hidden depth information more robustly than each method alone. Our method employs two joint approaches: (1) LID: a classification-dominated approach with sequential Linear Increasing Discretization. (2) DepJoint: a regression-dominated approach with multiple Eigen's transformations for depth estimation. Evaluating on KITTI dataset for moderate objects, Center3D improved the AP in BEV from $29.7\%$ to $42.8\%$, and the AP in 3D from $18.6\%$ to $39.1\%$. Compared with state-of-the-art detectors, Center3D has achieved the best speed-accuracy trade-off in realtime monocular object detection.