InsFusion: Rethink Instance-level LiDAR-Camera Fusion for 3D Object Detection
This addresses a critical issue for autonomous driving and smart transportation systems, though it appears incremental as it builds on existing baseline methods.
The paper tackles the problem of noise and error accumulation in 3D object detection from LiDAR and camera data by proposing InsFusion, which extracts proposals from raw and fused features to query raw features and uses attention mechanisms, achieving new state-of-the-art performance on the nuScenes dataset.
Three-dimensional Object Detection from multi-view cameras and LiDAR is a crucial component for autonomous driving and smart transportation. However, in the process of basic feature extraction, perspective transformation, and feature fusion, noise and error will gradually accumulate. To address this issue, we propose InsFusion, which can extract proposals from both raw and fused features and utilizes these proposals to query the raw features, thereby mitigating the impact of accumulated errors. Additionally, by incorporating attention mechanisms applied to the raw features, it thereby mitigates the impact of accumulated errors. Experiments on the nuScenes dataset demonstrate that InsFusion is compatible with various advanced baseline methods and delivers new state-of-the-art performance for 3D object detection.