Stereo CenterNet based 3D Object Detection for Autonomous Driving
This addresses real-time 3D detection for autonomous driving, offering an incremental improvement over existing methods by reducing computational cost.
The paper tackles 3D object detection from stereo images for autonomous driving by proposing Stereo CenterNet (SC), which predicts semantic key points and uses geometric information to restore 3D bounding boxes, achieving the best speed-accuracy trade-off on the KITTI dataset without extra data.
Recently, three-dimensional (3D) detection based on stereo images has progressed remarkably; however, most advanced methods adopt anchor-based two-dimensional (2D) detection or depth estimation to address this problem. Nevertheless, high computational cost inhibits these methods from achieving real-time performance. In this study, we propose a 3D object detection method, Stereo CenterNet (SC), using geometric information in stereo imagery. SC predicts the four semantic key points of the 3D bounding box of the object in space and utilizes 2D left and right boxes, 3D dimension, orientation, and key points to restore the bounding box of the object in the 3D space. Subsequently, we adopt an improved photometric alignment module to further optimize the position of the 3D bounding box. Experiments conducted on the KITTI dataset indicate that the proposed SC exhibits the best speed-accuracy trade-off among advanced methods without using extra data.