CVDec 29, 2024

MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation

Minjae Seong, Jisong Kim, Geonho Bang, Hawook Jeong, Jun Won Choi

arXiv:2412.20480v17.64 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses inefficiencies and occlusion challenges in 3D perception for autonomous driving, offering an incremental improvement over existing camera-LiDAR fusion methods.

The paper tackles the problem of inefficient and inaccurate 3D semantic occupancy prediction in autonomous driving by proposing MR-Occ, which uses hierarchical multi-resolution voxel representation to improve efficiency and handle occlusions, achieving state-of-the-art results with a +5.2% IoU and +5.3% mIoU gain on the nuScenes-Occupancy dataset while reducing computational costs.

Accurate 3D perception is essential for understanding the environment in autonomous driving. Recent advancements in 3D semantic occupancy prediction have leveraged camera-LiDAR fusion to improve robustness and accuracy. However, current methods allocate computational resources uniformly across all voxels, leading to inefficiency, and they also fail to adequately address occlusions, resulting in reduced accuracy in challenging scenarios. We propose MR-Occ, a novel approach for camera-LiDAR fusion-based 3D semantic occupancy prediction, addressing these challenges through three key components: Hierarchical Voxel Feature Refinement (HVFR), Multi-scale Occupancy Decoder (MOD), and Pixel to Voxel Fusion Network (PVF-Net). HVFR improves performance by enhancing features for critical voxels, reducing computational cost. MOD introduces an `occluded' class to better handle regions obscured from sensor view, improving accuracy. PVF-Net leverages densified LiDAR features to effectively fuse camera and LiDAR data through a deformable attention mechanism. Extensive experiments demonstrate that MR-Occ achieves state-of-the-art performance on the nuScenes-Occupancy dataset, surpassing previous approaches by +5.2% in IoU and +5.3% in mIoU while using fewer parameters and FLOPs. Moreover, MR-Occ demonstrates superior performance on the SemanticKITTI dataset, further validating its effectiveness and generalizability across diverse 3D semantic occupancy benchmarks.

View on arXiv PDF

Similar