RO AI CL CVOct 3, 2023

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton

CMU

arXiv:2310.01957v242.1343 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for explainable and generalizable autonomous driving systems, though it appears incremental by building on existing LLM capabilities.

The authors tackled the problem of improving context understanding in autonomous driving by introducing an object-level multimodal LLM architecture that fuses vectorized numeric modalities with a pre-trained LLM, resulting in a model that demonstrates proficiency in interpreting driving scenarios, answering questions, and decision-making, with a new dataset of 160k QA pairs.

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.

View on arXiv PDF Code

Similar