GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
This work addresses a critical issue in autonomous driving by improving robustness to sensor calibration errors, though it is incremental as it builds on existing BEV fusion methods.
The paper tackles the problem of inaccurate calibration between LiDAR and camera sensors in multi-modal 3D object detection for autonomous driving, which causes feature misalignment, and proposes GraphBEV, a robust fusion framework that achieves state-of-the-art performance with a 70.1% mAP, surpassing BEV Fusion by 1.6% and by 8.3% under misalignment noise conditions.
Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.