CVFeb 24

Boosting Instance Awareness via Cross-View Correlation with 4D Radar and Camera for 3D Object Detection

arXiv:2602.20632v11 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of reliable instance activation in autonomous driving perception under weak radar geometry, representing an incremental improvement over existing fusion methods.

The paper tackles the problem of 3D object detection using 4D radar and camera fusion, where sparse radar data limits instance awareness, by proposing SIFormer, a scene-instance aware transformer that bridges BEV-level and perspective-level fusion paradigms to enhance instance focus and context. It achieves state-of-the-art performance on View-of-Delft, TJ4DRadSet, and NuScenes datasets.

4D millimeter-wave radar has emerged as a promising sensing modality for autonomous driving due to its robustness and affordability. However, its sparse and weak geometric cues make reliable instance activation difficult, limiting the effectiveness of existing radar-camera fusion paradigms. BEV-level fusion offers global scene understanding but suffers from weak instance focus, while perspective-level fusion captures instance details but lacks holistic context. To address these limitations, we propose SIFormer, a scene-instance aware transformer for 3D object detection using 4D radar and camera. SIFormer first suppresses background noise during view transformation through segmentation- and depth-guided localization. It then introduces a cross-view activation mechanism that injects 2D instance cues into BEV space, enabling reliable instance awareness under weak radar geometry. Finally, a transformer-based fusion module aggregates complementary image semantics and radar geometry for robust perception. As a result, with the aim of enhancing instance awareness, SIFormer bridges the gap between the two paradigms, combining their complementary strengths to address inherent sparse nature of radar and improve detection accuracy. Experiments demonstrate that SIFormer achieves state-of-the-art performance on View-of-Delft, TJ4DRadSet and NuScenes datasets. Source code is available at github.com/shawnnnkb/SIFormer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes