CVFeb 25

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

arXiv:2602.21952v13 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in applying vision-language models to autonomous driving, offering an incremental improvement over existing methods.

The paper tackles the challenge of aligning textual reasoning with physical trajectory planning in autonomous driving by proposing MindDriver, a progressive multimodal reasoning framework that achieves superior performance on nuScenes and Bench2Drive benchmarks.

Vision-Language Models (VLM) exhibit strong reasoning capabilities, showing promise for end-to-end autonomous driving systems. Chain-of-Thought (CoT), as VLM's widely used reasoning strategy, is facing critical challenges. Existing textual CoT has a large gap between text semantic space and trajectory physical space. Although the recent approach utilizes future image to replace text as CoT process, it lacks clear planning-oriented objective guidance to generate images with accurate scene evolution. To address these, we innovatively propose MindDriver, a progressive multimodal reasoning framework that enables VLM to imitate human-like progressive thinking for autonomous driving. MindDriver presents semantic understanding, semantic-to-physical space imagination, and physical-space trajectory planning. To achieve aligned reasoning processes in MindDriver, we develop a feedback-guided automatic data annotation pipeline to generate aligned multimodal reasoning training data. Furthermore, we develop a progressive reinforcement fine-tuning method to optimize the alignment through progressive high- level reward-based learning. MindDriver demonstrates superior performance in both nuScences open-loop and Bench2Drive closed-loop evaluation. Codes are available at https://github.com/hotdogcheesewhite/MindDriver.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes