CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View Transformation
This addresses a specific problem in autonomous driving by reducing false positives in 3D object detection, though it is incremental as it builds on existing backward projection methods.
The paper tackles depth ambiguity in camera-radar fusion for 3D object detection by proposing CRAB, a model that uses backward projection with radar to improve depth distinction, achieving state-of-the-art performance with 62.4% NDS and 54.0% mAP on the nuScenes dataset.
Recently, camera-radar fusion-based 3D object detection methods in bird's eye view (BEV) have gained attention due to the complementary characteristics and cost-effectiveness of these sensors. Previous approaches using forward projection struggle with sparse BEV feature generation, while those employing backward projection overlook depth ambiguity, leading to false positives. In this paper, to address the aforementioned limitations, we propose a novel camera-radar fusion-based 3D object detection and segmentation model named CRAB (Camera-Radar fusion for reducing depth Ambiguity in Backward projection-based view transformation), using a backward projection that leverages radar to mitigate depth ambiguity. During the view transformation, CRAB aggregates perspective view image context features into BEV queries. It improves depth distinction among queries along the same ray by combining the dense but unreliable depth distribution from images with the sparse yet precise depth information from radar occupancy. We further introduce spatial cross-attention with a feature map containing radar context information to enhance the comprehension of the 3D scene. When evaluated on the nuScenes open dataset, our proposed approach achieves a state-of-the-art performance among backward projection-based camera-radar fusion methods with 62.4\% NDS and 54.0\% mAP in 3D object detection.