CVRODec 2, 2025

BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

arXiv:2512.02972v13 citationsh-index: 13Has Code
Originality Highly original
AI Analysis

This work addresses a key challenge in autonomous driving by improving multi-modal fusion for 3D object detection, though it is incremental as it builds on existing BEV-based methods.

The paper tackles the problem of degraded performance in 3D object detection due to geometric disparities in LiDAR-camera fusion by proposing BEVDilation, a LiDAR-centric framework that uses image features as implicit guidance, achieving better performance on the nuScenes benchmark while maintaining computational efficiency and robustness to depth noise.

Integrating LiDAR and camera information in the bird's eye view (BEV) representation has demonstrated its effectiveness in 3D object detection. However, because of the fundamental disparity in geometric accuracy between these sensors, indiscriminate fusion in previous methods often leads to degraded performance. In this paper, we propose BEVDilation, a novel LiDAR-centric framework that prioritizes LiDAR information in the fusion. By formulating image BEV features as implicit guidance rather than naive concatenation, our strategy effectively alleviates the spatial misalignment caused by image depth estimation errors. Furthermore, the image guidance can effectively help the LiDAR-centric paradigm to address the sparsity and semantic limitations of point clouds. Specifically, we propose a Sparse Voxel Dilation Block that mitigates the inherent point sparsity by densifying foreground voxels through image priors. Moreover, we introduce a Semantic-Guided BEV Dilation Block to enhance the LiDAR feature diffusion processing with image semantic guidance and long-range context capture. On the challenging nuScenes benchmark, BEVDilation achieves better performance than state-of-the-art methods while maintaining competitive computational efficiency. Importantly, our LiDAR-centric strategy demonstrates greater robustness to depth noise compared to naive fusion. The source code is available at https://github.com/gwenzhang/BEVDilation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes