OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction
This addresses the need for more accurate and robust 3D scene understanding in autonomous driving systems, though it appears incremental as it builds on existing fusion-based methods.
The paper tackles the problem of 3D occupancy prediction for autonomous driving by proposing OccFusion, a multi-sensor fusion framework that eliminates the need for depth estimation, achieving superior performance on benchmarks like nuScenes-Occupancy and nuScenes-Occ3D.
3D occupancy prediction based on multi-sensor fusion,crucial for a reliable autonomous driving system, enables fine-grained understanding of 3D scenes. Previous fusion-based 3D occupancy predictions relied on depth estimation for processing 2D image features. However, depth estimation is an ill-posed problem, hindering the accuracy and robustness of these methods. Furthermore, fine-grained occupancy prediction demands extensive computational resources. To address these issues, we propose OccFusion, a depth estimation free multi-modal fusion framework. Additionally, we introduce a generalizable active training method and an active decoder that can be applied to any occupancy prediction model, with the potential to enhance their performance. Experiments conducted on nuScenes-Occupancy and nuScenes-Occ3D demonstrate our framework's superior performance. Detailed ablation studies highlight the effectiveness of each proposed method.