AICVJan 29

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving

arXiv:2601.21288v1h-index: 8
Originality Incremental advance
AI Analysis

This addresses the efficiency challenge for deploying autonomous driving systems, though it is incremental as it builds on existing distillation methods.

The paper tackled the problem of high GPU memory and inference latency in large vision-language models for autonomous driving by proposing Drive-KD, a multi-teacher knowledge distillation framework that decomposes tasks into perception, reasoning, and planning; the distilled InternVL3-1B model achieved better overall performance than a 78B model with 42 times less GPU memory and 11.4 times higher throughput.

Autonomous driving is an important and safety-critical task, and recent advances in LLMs/VLMs have opened new possibilities for reasoning and planning in this domain. However, large models demand substantial GPU memory and exhibit high inference latency, while conventional supervised fine-tuning (SFT) often struggles to bridge the capability gaps of small models. To address these limitations, we propose Drive-KD, a framework that decomposes autonomous driving into a "perception-reasoning-planning" triad and transfers these capabilities via knowledge distillation. We identify layer-specific attention as the distillation signal to construct capability-specific single-teacher models that outperform baselines. Moreover, we unify these single-teacher settings into a multi-teacher distillation framework and introduce asymmetric gradient projection to mitigate cross-capability gradient conflicts. Extensive evaluations validate the generalization of our method across diverse model families and scales. Experiments show that our distilled InternVL3-1B model, with ~42 times less GPU memory and ~11.4 times higher throughput, achieves better overall performance than the pretrained 78B model from the same family on DriveBench, and surpasses GPT-5.1 on the planning dimension, providing insights toward efficient autonomous driving VLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes