REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction
This work addresses the limitation of vision-based 3D occupancy prediction for autonomous vehicles by improving sensor fusion, though it appears incremental as it builds on existing camera-radar fusion methods.
The paper tackles the problem of 3D occupancy prediction in challenging environments by proposing REOcc, a camera-radar fusion network that enriches radar features to mitigate sparsity and noise, achieving significant performance gains over camera-only baselines, particularly in dynamic object classes on the Occ3D-nuScenes benchmark.
Vision-based 3D occupancy prediction has made significant advancements, but its reliance on cameras alone struggles in challenging environments. This limitation has driven the adoption of sensor fusion, among which camera-radar fusion stands out as a promising solution due to their complementary strengths. However, the sparsity and noise of the radar data limits its effectiveness, leading to suboptimal fusion performance. In this paper, we propose REOcc, a novel camera-radar fusion network designed to enrich radar feature representations for 3D occupancy prediction. Our approach introduces two main components, a Radar Densifier and a Radar Amplifier, which refine radar features by integrating spatial and contextual information, effectively enhancing spatial density and quality. Extensive experiments on the Occ3D-nuScenes benchmark demonstrate that REOcc achieves significant performance gains over the camera-only baseline model, particularly in dynamic object classes. These results underscore REOcc's capability to mitigate the sparsity and noise of the radar data. Consequently, radar complements camera data more effectively, unlocking the full potential of camera-radar fusion for robust and reliable 3D occupancy prediction.