CVNov 14, 2022

Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection

arXiv:2211.07171v123.1104 citationsh-index: 15Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of accurate 3D detection from single images for autonomous driving, representing an incremental improvement over existing methods.

The paper tackles the problem of monocular 3D object detection by proposing a Cross-Modality Knowledge Distillation (CMKD) network that transfers knowledge from LiDAR to image modalities, achieving state-of-the-art performance with significant gains on KITTI and Waymo datasets.

Leveraging LiDAR-based detectors or real LiDAR point data to guide monocular 3D detection has brought significant improvement, e.g., Pseudo-LiDAR methods. However, the existing methods usually apply non-end-to-end training strategies and insufficiently leverage the LiDAR information, where the rich potential of the LiDAR data has not been well exploited. In this paper, we propose the Cross-Modality Knowledge Distillation (CMKD) network for monocular 3D detection to efficiently and directly transfer the knowledge from LiDAR modality to image modality on both features and responses. Moreover, we further extend CMKD as a semi-supervised training framework by distilling knowledge from large-scale unlabeled data and significantly boost the performance. Until submission, CMKD ranks $1^{st}$ among the monocular 3D detectors with publications on both KITTI $test$ set and Waymo $val$ set with significant performance gains compared to previous state-of-the-art methods.

View on arXiv PDF Code

Similar