CVAISep 23, 2024

UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection

arXiv:2409.14751v18 citationsh-index: 11
AI Analysis

This work addresses robustness and performance issues in autonomous driving systems using radar-vision fusion, offering incremental improvements over existing methods.

The paper tackles the problem of underutilizing radar-specific information and sensitivity to vision failure in radar-vision fusion for 3D object detection, proposing UniBEVFusion with a Radar Depth Lift-Splat-Shoot module and Unified Feature Fusion approach, which improves 3D object detection accuracy by 1.44 and BEV accuracy by 1.72 on the TJ4D dataset compared to state-of-the-art models.

4D millimeter-wave (MMW) radar, which provides both height information and dense point cloud data over 3D MMW radar, has become increasingly popular in 3D object detection. In recent years, radar-vision fusion models have demonstrated performance close to that of LiDAR-based models, offering advantages in terms of lower hardware costs and better resilience in extreme conditions. However, many radar-vision fusion models treat radar as a sparse LiDAR, underutilizing radar-specific information. Additionally, these multi-modal networks are often sensitive to the failure of a single modality, particularly vision. To address these challenges, we propose the Radar Depth Lift-Splat-Shoot (RDL) module, which integrates radar-specific data into the depth prediction process, enhancing the quality of visual Bird-Eye View (BEV) features. We further introduce a Unified Feature Fusion (UFF) approach that extracts BEV features across different modalities using shared module. To assess the robustness of multi-modal models, we develop a novel Failure Test (FT) ablation experiment, which simulates vision modality failure by injecting Gaussian noise. We conduct extensive experiments on the View-of-Delft (VoD) and TJ4D datasets. The results demonstrate that our proposed Unified BEVFusion (UniBEVFusion) network significantly outperforms state-of-the-art models on the TJ4D dataset, with improvements of 1.44 in 3D and 1.72 in BEV object detection accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes