V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving
This addresses the problem of scalable and trustworthy autonomous driving for vehicles and infrastructure by integrating multimodal V2X data with knowledge reasoning, though it appears incremental as it combines existing techniques like RAG with V2X.
The paper tackles the limitations of single-vehicle perception in autonomous driving by proposing V2X-UniPool, a framework that unifies V2X perception with language-based reasoning, achieving state-of-the-art planning accuracy and safety while reducing communication cost by over 80% on the DAIR-V2X dataset.
Autonomous driving (AD) has achieved significant progress, yet single-vehicle perception remains constrained by sensing range and occlusions. Vehicle-to-Everything (V2X) communication addresses these limits by enabling collaboration across vehicles and infrastructure, but it also faces heterogeneity, synchronization, and latency constraints. Language models offer strong knowledge-driven reasoning and decision-making capabilities, but they are not inherently designed to process raw sensor streams and are prone to hallucination. We propose V2X-UniPool, the first framework that unifies V2X perception with language-based reasoning for knowledge-driven AD. It transforms multimodal V2X data into structured, language-based knowledge, organizes it in a time-indexed knowledge pool for temporally consistent reasoning, and employs Retrieval-Augmented Generation (RAG) to ground decisions in real-time context. Experiments on the real-world DAIR-V2X dataset show that V2X-UniPool achieves state-of-the-art planning accuracy and safety while reducing communication cost by more than 80\%, achieving the lowest overhead among evaluated methods. These results highlight the promise of bridging V2X perception and language reasoning to advance scalable and trustworthy driving. Our code is available at: https://github.com/Xuewen2025/V2X-UniPool