IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion
This work addresses the challenge of sensor fusion in autonomous driving by improving detection accuracy without LiDAR at inference, though it is incremental over existing distillation approaches.
The paper tackles the problem of preserving sensor-specific characteristics in radar-camera fusion for 3D object detection by introducing IMKD, a multi-level knowledge distillation framework that achieves 67.0% NDS and 61.0% mAP on the nuScenes benchmark, outperforming prior methods.
High-performance Radar-Camera 3D object detection can be achieved by leveraging knowledge distillation without using LiDAR at inference time. However, existing distillation methods typically transfer modality-specific features directly to each sensor, which can distort their unique characteristics and degrade their individual strengths. To address this, we introduce IMKD, a radar-camera fusion framework based on multi-level knowledge distillation that preserves each sensor's intrinsic characteristics while amplifying their complementary strengths. IMKD applies a three-stage, intensity-aware distillation strategy to enrich the fused representation across the architecture: (1) LiDAR-to-Radar intensity-aware feature distillation to enhance radar representations with fine-grained structural cues, (2) LiDAR-to-Fused feature intensity-guided distillation to selectively highlight useful geometry and depth information at the fusion level, fostering complementarity between the modalities rather than forcing them to align, and (3) Camera-Radar intensity-guided fusion mechanism that facilitates effective feature alignment and calibration. Extensive experiments on the nuScenes benchmark show that IMKD reaches 67.0% NDS and 61.0% mAP, outperforming all prior distillation-based radar-camera fusion methods. Our code and models are available at https://github.com/dfki-av/IMKD/.