MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection
This work addresses the problem of multi-modal data fusion for 3D object detection, which is incremental as it builds on existing methods with specific enhancements.
The paper tackles the challenge of fusing point clouds and images for 3D object detection by proposing MBDF-Net, which uses adaptive attention fusion and an attention-based hybrid sampling strategy, achieving improved results on KITTI and SUN-RGBD benchmarks.
Point clouds and images could provide complementary information when representing 3D objects. Fusing the two kinds of data usually helps to improve the detection results. However, it is challenging to fuse the two data modalities, due to their different characteristics and the interference from the non-interest areas. To solve this problem, we propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection. The proposed detector has two stages. In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion (AAF) modules to produce cross-modal fusion features from single-modal semantic features. In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement. A novel attention-based hybrid sampling strategy is also proposed for selecting key points in the downsampling process. We evaluate our approach on two widely used benchmark datasets including KITTI and SUN-RGBD. The experimental results demonstrate the advantages of our method over state-of-the-art approaches.