UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection
This work addresses robustness and performance issues in autonomous driving systems using radar-vision fusion, offering incremental improvements over existing methods.
The paper tackles the problem of underutilizing radar-specific information and sensitivity to vision failure in radar-vision fusion for 3D object detection, proposing UniBEVFusion with a Radar Depth Lift-Splat-Shoot module and Unified Feature Fusion approach, which improves 3D object detection accuracy by 1.44 and BEV accuracy by 1.72 on the TJ4D dataset compared to state-of-the-art models.
4D millimeter-wave (MMW) radar, which provides both height information and dense point cloud data over 3D MMW radar, has become increasingly popular in 3D object detection. In recent years, radar-vision fusion models have demonstrated performance close to that of LiDAR-based models, offering advantages in terms of lower hardware costs and better resilience in extreme conditions. However, many radar-vision fusion models treat radar as a sparse LiDAR, underutilizing radar-specific information. Additionally, these multi-modal networks are often sensitive to the failure of a single modality, particularly vision. To address these challenges, we propose the Radar Depth Lift-Splat-Shoot (RDL) module, which integrates radar-specific data into the depth prediction process, enhancing the quality of visual Bird-Eye View (BEV) features. We further introduce a Unified Feature Fusion (UFF) approach that extracts BEV features across different modalities using shared module. To assess the robustness of multi-modal models, we develop a novel Failure Test (FT) ablation experiment, which simulates vision modality failure by injecting Gaussian noise. We conduct extensive experiments on the View-of-Delft (VoD) and TJ4D datasets. The results demonstrate that our proposed Unified BEVFusion (UniBEVFusion) network significantly outperforms state-of-the-art models on the TJ4D dataset, with improvements of 1.44 in 3D and 1.72 in BEV object detection accuracy.