CVApr 9, 2023

Sparse Dense Fusion for 3D Object Detection

arXiv:2304.04179v116 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses a key challenge in multimodal 3D object detection for autonomous driving, offering an incremental improvement over existing fusion methods.

The paper tackles the problem of information loss in camera-LiDAR fusion for 3D object detection by proposing Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse and dense fusion modules, resulting in a 4.3% improvement in mAP and 2.5% in NDS on the nuScenes benchmark.

With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection. Although multiple fusion approaches have been proposed, they can be classified into either sparse-only or dense-only fashion based on the feature representation in the fusion module. In this paper, we analyze them in a common taxonomy and thereafter observe two challenges: 1) sparse-only solutions preserve 3D geometric prior and yet lose rich semantic information from the camera, and 2) dense-only alternatives retain the semantic continuity but miss the accurate geometric information from LiDAR. By analyzing these two formulations, we conclude that the information loss is inevitable due to their design scheme. To compensate for the information loss in either manner, we propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture. Such a simple yet effective sparse-dense fusion structure enriches semantic texture and exploits spatial structure information simultaneously. Through our SDF strategy, we assemble two popular methods with moderate performance and outperform baseline by 4.3% in mAP and 2.5% in NDS, ranking first on the nuScenes benchmark. Extensive ablations demonstrate the effectiveness of our method and empirically align our analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes