CVMar 5, 2025

BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation

arXiv:2503.03280v12 citationsh-index: 13VISIGRAPP : VISAPP
AI Analysis

This addresses the critical need for reliable obstacle avoidance in autonomous vehicles, especially in challenging conditions like low-light and adverse weather, representing a strong specific gain in a domain-specific area.

The paper tackles the problem of accurately segmenting moving objects in bird's-eye-view for autonomous vehicles by introducing BEVMOSNet, the first end-to-end multimodal fusion method using cameras, LiDAR, and radar, which improves IoU scores by 36.59% over a vision-based baseline and 2.35% over a multimodal baseline on the nuScenes dataset.

Accurate motion understanding of the dynamic objects within the scene in bird's-eye-view (BEV) is critical to ensure a reliable obstacle avoidance system and smooth path planning for autonomous vehicles. However, this task has received relatively limited exploration when compared to object detection and segmentation with only a few recent vision-based approaches presenting preliminary findings that significantly deteriorate in low-light, nighttime, and adverse weather conditions such as rain. Conversely, LiDAR and radar sensors remain almost unaffected in these scenarios, and radar provides key velocity information of the objects. Therefore, we introduce BEVMOSNet, to our knowledge, the first end-to-end multimodal fusion leveraging cameras, LiDAR, and radar to precisely predict the moving objects in BEV. In addition, we perform a deeper analysis to find out the optimal strategy for deformable cross-attention-guided sensor fusion for cross-sensor knowledge sharing in BEV. While evaluating BEVMOSNet on the nuScenes dataset, we show an overall improvement in IoU score of 36.59% compared to the vision-based unimodal baseline BEV-MoSeg (Sigatapu et al., 2023), and 2.35% compared to the multimodel SimpleBEV (Harley et al., 2022), extended for the motion segmentation task, establishing this method as the state-of-the-art in BEV motion segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes