RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion
This work addresses a critical bottleneck in sensor fusion for autonomous vehicles, offering a modular solution that improves detection performance, though it is incremental relative to existing camera-based methods.
The paper tackles the problem of effectively fusing sparse radar point clouds with dense camera images for 3D object detection in autonomous driving, achieving up to a 28% increase in detection score on the nuScenes dataset and setting a new state-of-the-art in radar-camera fusion.
Radars and cameras belong to the most frequently used sensors for advanced driver assistance systems and automated driving research. However, there has been surprisingly little research on radar-camera fusion with neural networks. One of the reasons is a lack of large-scale automotive datasets with radar and unmasked camera data, with the exception of the nuScenes dataset. Another reason is the difficulty of effectively fusing the sparse radar point cloud on the bird's eye view (BEV) plane with the dense images on the perspective plane. The recent trend of camera-based 3D object detection using BEV features has enabled a new type of fusion, which is better suited for radars. In this work, we present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it can be incorporated into several state-of-the-art camera-based architectures. We show significant performance gains of up to 28% increase in the nuScenes detection score, which is an important step in radar-camera fusion research. Without tuning our model for the nuScenes benchmark, we achieve the best result among all published methods in the radar-camera fusion category.