ROCVLGMMSYJul 7, 2025

NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving

arXiv:2507.05227v19 citationsh-index: 7MM
Originality Incremental advance
AI Analysis

This work addresses the problem of limited navigational context in autonomous driving systems, enabling more reliable and safe navigation in complex environments, though it appears incremental as it builds on existing vision-language models and methods.

The paper tackles the gap between local perception and global navigation in autonomous driving by proposing NavigScene, a navigation-guided natural language dataset, and three paradigms to integrate it, resulting in significant performance improvements across perception, prediction, planning, and question-answering tasks.

Autonomous driving systems have made significant advances in Q&A, perception, prediction, and planning based on local visual information, yet they struggle to incorporate broader navigational context that human drivers routinely utilize. We address this critical gap between local sensor data and global navigation information by proposing NavigScene, an auxiliary navigation-guided natural language dataset that simulates a human-like driving environment within autonomous driving systems. Moreover, we develop three complementary paradigms to leverage NavigScene: (1) Navigation-guided Reasoning, which enhances vision-language models by incorporating navigation context into the prompting approach; (2) Navigation-guided Preference Optimization, a reinforcement learning method that extends Direct Preference Optimization to improve vision-language model responses by establishing preferences for navigation-relevant summarized information; and (3) Navigation-guided Vision-Language-Action model, which integrates navigation guidance and vision-language models with conventional driving models through feature fusion. Extensive experiments demonstrate that our approaches significantly improve performance across perception, prediction, planning, and question-answering tasks by enabling reasoning capabilities beyond visual range and improving generalization to diverse driving scenarios. This work represents a significant step toward more comprehensive autonomous driving systems capable of navigating complex, unfamiliar environments with greater reliability and safety.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes