CVAug 21, 2025

MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction

Ziyang Yan, Ruikai Li, Zhiyong Cui, Bohan Li, Han Jiang, Yilong Ren, Aoyong Li, Zhenning Li, Sijia Wen, Haiyang Yu

arXiv:2508.15653v23 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This work addresses computational overhead in autonomous driving systems by enabling efficient, vision-only map construction, though it is incremental as it builds on prior knowledge distillation methods.

The paper tackles the problem of inefficient online HD map construction in autonomous driving by proposing MapKD, a knowledge distillation framework that transfers knowledge from multimodal models to a lightweight vision-centric student model, resulting in a +6.68 mIoU and +10.94 mAP improvement on the nuScenes dataset.

Online HD map construction is a fundamental task in autonomous driving systems, aiming to acquire semantic information of map elements around the ego vehicle based on real-time sensor inputs. Recently, several approaches have achieved promising results by incorporating offline priors such as SD maps and HD maps or by fusing multi-modal data. However, these methods depend on stale offline maps and multi-modal sensor suites, resulting in avoidable computational overhead at inference. To address these limitations, we employ a knowledge distillation strategy to transfer knowledge from multimodal models with prior knowledge to an efficient, low-cost, and vision-centric student model. Specifically, we propose MapKD, a novel multi-level cross-modal knowledge distillation framework with an innovative Teacher-Coach-Student (TCS) paradigm. This framework consists of: (1) a camera-LiDAR fusion model with SD/HD map priors serving as the teacher; (2) a vision-centric coach model with prior knowledge and simulated LiDAR to bridge the cross-modal knowledge transfer gap; and (3) a lightweight vision-based student model. Additionally, we introduce two targeted knowledge distillation strategies: Token-Guided 2D Patch Distillation (TGPD) for bird's eye view feature alignment and Masked Semantic Response Distillation (MSRD) for semantic learning guidance. Extensive experiments on the challenging nuScenes dataset demonstrate that MapKD improves the student model by +6.68 mIoU and +10.94 mAP while simultaneously accelerating inference speed. The code is available at:https://github.com/2004yan/MapKD2026.

View on arXiv PDF Code

Similar