CVFeb 16

DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving

arXiv:2602.14577v16 citationsh-index: 2Has Code
Originality Incremental advance
AI Analysis

This work addresses modality alignment and generalization issues in autonomous driving planning, offering a novel hybrid approach that could improve safety and performance, though it appears incremental as it builds on existing paradigms.

The paper tackles the complementary weaknesses of diffusion-based and token-based planners in autonomous driving by proposing DriveFine, a masked diffusion VLA model that integrates flexible decoding with self-correction, achieving strong efficacy and robustness on benchmarks like NAVSIM v1, v2, and Navhard.

Vision-Language-Action (VLA) models for autonomous driving increasingly adopt generative planners trained with imitation learning followed by reinforcement learning. Diffusion-based planners suffer from modality alignment difficulties, low training efficiency, and limited generalization. Token-based planners are plagued by cumulative causal errors and irreversible decoding. In summary, the two dominant paradigms exhibit complementary strengths and weaknesses. In this paper, we propose DriveFine, a masked diffusion VLA model that combines flexible decoding with self-correction capabilities. In particular, we design a novel plug-and-play block-MoE, which seamlessly injects a refinement expert on top of the generation expert. By enabling explicit expert selection during inference and gradient blocking during training, the two experts are fully decoupled, preserving the foundational capabilities and generic patterns of the pretrained weights, which highlights the flexibility and extensibility of the block-MoE design. Furthermore, we design a hybrid reinforcement learning strategy that encourages effective exploration of refinement expert while maintaining training stability. Extensive experiments on NAVSIM v1, v2, and Navhard benchmarks demonstrate that DriveFine exhibits strong efficacy and robustness. The code will be released at https://github.com/MSunDYY/DriveFine.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes