CVMay 23

NudgeVAD: Language-Nudged End-to-End Driving via FiLM Residuals

arXiv:2605.2453154.2
Predicted impact top 45% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For autonomous driving researchers, this work clarifies that language instructions are most beneficial when the command channel is unreliable, rather than universally additive.

NudgeVAD introduces a frozen-planner residual framework that uses language as a calibrated nudge to improve end-to-end driving, particularly when high-level commands are unreliable. With random commands, NudgeVAD achieves 2.806 m ADE6s, outperforming the no-language baseline by 0.312 m.

Natural-language instructions promise controllable end-to-end driving, but their benefit can be hidden when planners already receive reliable high-level commands. We propose NudgeVAD, a frozen-planner residual framework that uses language as a calibrated nudge to a VAD trajectory. With identity-initialized FiLM and a zero-initialized residual head, NudgeVAD is equivalent to the frozen planner at initialization, so learned deviations arise only from language-conditioned residuals. We evaluate NudgeVAD along a command-reliability axis. With reliable commands, language improves the initial planner but becomes nearly redundant once compared against VAD-FT (UNCOND), a compute-matched VAD model fine-tuned without language. With random commands, however, language becomes essential: detaching text degrades ADE6s to 3.166 m, while NudgeVAD with text recovers 2.806 m and outperforms VAD-FT (UNCOND) by 0.312 m. These results show that language is not universally additive; it is most valuable when the categorical command channel is unreliable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes