CVNov 1, 2024

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

arXiv:2411.00340v129 citationsh-index: 19CVPR
Originality Highly original
AI Analysis

This work addresses 3D object detection for autonomous driving systems, presenting an incremental improvement through novel fusion mechanisms.

The paper tackled the problem of insufficient complementary interaction between LiDAR and camera in 3D multi-modality object detection by proposing GAFusion, which achieved state-of-the-art results with 73.6% mAP and 74.9% NDS on the nuScenes test set.

Recent years have witnessed the remarkable progress of 3D multi-modality object detection methods based on the Bird's-Eye-View (BEV) perspective. However, most of them overlook the complementary interaction and guidance between LiDAR and camera. In this work, we propose a novel multi-modality 3D objection detection method, named GAFusion, with LiDAR-guided global interaction and adaptive fusion. Specifically, we introduce sparse depth guidance (SDG) and LiDAR occupancy guidance (LOG) to generate 3D features with sufficient depth information. In the following, LiDAR-guided adaptive fusion transformer (LGAFT) is developed to adaptively enhance the interaction of different modal BEV features from a global perspective. Meanwhile, additional downsampling with sparse height compression and multi-scale dual-path transformer (MSDPT) are designed to enlarge the receptive fields of different modal features. Finally, a temporal fusion module is introduced to aggregate features from previous frames. GAFusion achieves state-of-the-art 3D object detection results with 73.6$\%$ mAP and 74.9$\%$ NDS on the nuScenes test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes