CVAILGRONov 25, 2025

DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving

arXiv:2511.20720v1
Originality Incremental advance
AI Analysis

This work addresses efficiency bottlenecks in autonomous driving systems, offering a practical improvement for real-time applications, though it is incremental as it builds on existing VLA models.

The paper tackles the high inference latency of Vision-Language Action models in autonomous driving by proposing DeeAD, a training-free early-exit framework that accelerates planning by evaluating trajectory feasibility, achieving up to 29% latency reduction while maintaining planning quality and safety.

Vision-Language Action (VLA) models unify perception, reasoning, and trajectory generation for autonomous driving, but suffer from significant inference latency due to deep transformer stacks. We present DeeAD, a training-free, action-guided early-exit framework that accelerates VLA planning by evaluating the physical feasibility of intermediate trajectories. Instead of relying on confidence scores, DeeAD terminates inference when predicted trajectories align with lightweight planning priors (e.g., Navigation or Low-precision Planning) within a tolerable deviation (<2m). To improve efficiency, we introduce a multi-hop controller that adaptively skips redundant layers based on the change rate of scores. DeeAD integrates into existing VLA models, such as ORION, without requiring retraining. Experiments on the Bench2Drive benchmark demonstrate up to 28% transformer-layer sparsity and 29% latency reduction, while preserving planning quality and safety.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes