ROAILGAug 17, 2024

V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

arXiv:2408.09251v351 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses perception limitations in autonomous driving for improved safety and decision-making, though it appears incremental as it builds on existing V2X and VLM paradigms.

The paper tackles the challenge of fusing heterogeneous visual and semantic information for cooperative autonomous driving by introducing V2X-VLM, an end-to-end framework based on vision-language models, which achieves state-of-the-art trajectory planning accuracy with reduced L2 error and collision rates.

Vehicle-to-everything (V2X) cooperation has emerged as a promising paradigm to overcome the perception limitations of classical autonomous driving by leveraging information from both ego-vehicle and infrastructure sensors. However, effectively fusing heterogeneous visual and semantic information while ensuring robust trajectory planning remains a significant challenge. This paper introduces V2X-VLM, a novel end-to-end (E2E) cooperative autonomous driving framework based on vision-language models (VLMs). V2X-VLM integrates multiperspective camera views from vehicles and infrastructure with text-based scene descriptions to enable a more comprehensive understanding of driving environments. Specifically, we propose a contrastive learning-based mechanism to reinforce the alignment of heterogeneous visual and textual characteristics, which enhances the semantic understanding of complex driving scenarios, and employ a knowledge distillation strategy to stabilize training. Experiments on a large real-world dataset demonstrate that V2X-VLM achieves state-of-the-art trajectory planning accuracy, significantly reducing L2 error and collision rate compared to existing cooperative autonomous driving baselines. Ablation studies validate the contributions of each component. Moreover, the evaluation of robustness and efficiency highlights the practicality of V2X-VLM for real-world deployment to enhance overall autonomous driving safety and decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes