DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction
This work addresses the challenge of deploying high-performance 3D semantic occupancy prediction models for autonomous driving and robotics by reducing computational demands and improving efficiency.
This paper proposes DAOcc, a multi-modal occupancy prediction framework that uses 3D object detection supervision to improve performance while using a deployment-friendly image backbone and practical input resolution. It achieves new state-of-the-art results on Occ3D-nuScenes and Occ3D-Waymo benchmarks, outperforming previous methods with a ResNet-50 backbone and 256*704 input resolution, and reaching 104.9 FPS with 54.2 mIoU on an RTX 4090.
Multi-sensor fusion significantly enhances the accuracy and robustness of 3D semantic occupancy prediction, which is crucial for autonomous driving and robotics. However, most existing approaches depend on high-resolution images and complex networks to achieve top performance, hindering their deployment in practical scenarios. Moreover, current multi-sensor fusion approaches mainly focus on improving feature fusion while largely neglecting effective supervision strategies for those features. To address these issues, we propose DAOcc, a novel multi-modal occupancy prediction framework that leverages 3D object detection supervision to assist in achieving superior performance, while using a deployment-friendly image backbone and practical input resolution. In addition, we introduce a BEV View Range Extension strategy to mitigate performance degradation caused by lower image resolution. Extensive experiments demonstrate that DAOcc achieves new state-of-the-art results on both the Occ3D-nuScenes and Occ3D-Waymo benchmarks, and outperforms previous state-of-the-art methods by a significant margin using only a ResNet-50 backbone and 256*704 input resolution. With TensorRT optimization, DAOcc reaches 104.9 FPS while maintaining 54.2 mIoU on an NVIDIA RTX 4090 GPU. Code is available at https://github.com/AlphaPlusTT/DAOcc.