CVOct 13, 2023

Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving

arXiv:2310.08826v13 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses real-time safety needs in autonomous driving with incremental improvements to robustness against sensor calibration issues.

The paper tackles the challenges of efficient deployment and performance degradation under weak calibration in multi-modal 3D semantic segmentation for autonomous driving, proposing CPGNet-LCF which achieves state-of-the-art performance on benchmarks and runs in 20ms per frame on a V100 GPU.

LiDAR and camera are two critical sensors for multi-modal 3D semantic segmentation and are supposed to be fused efficiently and robustly to promise safety in various real-world scenarios. However, existing multi-modal methods face two key challenges: 1) difficulty with efficient deployment and real-time execution; and 2) drastic performance degradation under weak calibration between LiDAR and cameras. To address these challenges, we propose CPGNet-LCF, a new multi-modal fusion framework extending the LiDAR-only CPGNet. CPGNet-LCF solves the first challenge by inheriting the easy deployment and real-time capabilities of CPGNet. For the second challenge, we introduce a novel weak calibration knowledge distillation strategy during training to improve the robustness against the weak calibration. CPGNet-LCF achieves state-of-the-art performance on the nuScenes and SemanticKITTI benchmarks. Remarkably, it can be easily deployed to run in 20ms per frame on a single Tesla V100 GPU using TensorRT TF16 mode. Furthermore, we benchmark performance over four weak calibration levels, demonstrating the robustness of our proposed approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes