CVOct 13, 2023

Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving

Feng Jiang, Chaoping Tu, Gang Zhang, Jun Li, Hanqing Huang, Junyu Lin, Di Feng, Jian Pu

arXiv:2310.08826v13.93 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses real-time safety needs in autonomous driving with incremental improvements to robustness against sensor calibration issues.

The paper tackles the challenges of efficient deployment and performance degradation under weak calibration in multi-modal 3D semantic segmentation for autonomous driving, proposing CPGNet-LCF which achieves state-of-the-art performance on benchmarks and runs in 20ms per frame on a V100 GPU.

LiDAR and camera are two critical sensors for multi-modal 3D semantic segmentation and are supposed to be fused efficiently and robustly to promise safety in various real-world scenarios. However, existing multi-modal methods face two key challenges: 1) difficulty with efficient deployment and real-time execution; and 2) drastic performance degradation under weak calibration between LiDAR and cameras. To address these challenges, we propose CPGNet-LCF, a new multi-modal fusion framework extending the LiDAR-only CPGNet. CPGNet-LCF solves the first challenge by inheriting the easy deployment and real-time capabilities of CPGNet. For the second challenge, we introduce a novel weak calibration knowledge distillation strategy during training to improve the robustness against the weak calibration. CPGNet-LCF achieves state-of-the-art performance on the nuScenes and SemanticKITTI benchmarks. Remarkably, it can be easily deployed to run in 20ms per frame on a single Tesla V100 GPU using TensorRT TF16 mode. Furthermore, we benchmark performance over four weak calibration levels, demonstrating the robustness of our proposed approach.

View on arXiv PDF

Similar