RO AI LG SYFeb 28, 2025

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, Bo Li

arXiv:2503.00211v221.220 citationsh-index: 11Has CodeICML

Originality Incremental advance

AI Analysis

This work addresses safety and reliability issues in autonomous driving for real-world applications, representing an incremental improvement by combining existing techniques like probabilistic graphical models and retrieval-augmented generation.

The paper tackles the problem of connecting high-level reasoning with low-level control in autonomous driving systems, which often leads to unsafe behaviors, by proposing SafeAuto, a framework that enhances multimodal large language models with knowledge integration; it outperforms existing baselines across multiple datasets, enabling more accurate and safer driving.

Traditional autonomous driving systems often struggle to connect high-level reasoning with low-level control, leading to suboptimal and sometimes unsafe behaviors. Recent advances in multimodal large language models (MLLMs), which process both visual and textual data, offer an opportunity to unify perception and reasoning. However, effectively embedding precise safety knowledge into MLLMs for autonomous driving remains a significant challenge. To address this, we propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge. First, we introduce a Position-Dependent Cross-Entropy (PDCE) loss to improve low-level control signal predictions when values are represented as text. Second, to explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic (e.g., "red light $\implies$ stop") and embeds them into a probabilistic graphical model (e.g., Markov Logic Network) to verify predicted actions using recognized environmental attributes. Additionally, our Multimodal Retrieval-Augmented Generation (RAG) model leverages video, control signals, and environmental attributes to learn from past driving experiences. Integrating PDCE, MLN, and Multimodal RAG, SafeAuto outperforms existing baselines across multiple datasets, enabling more accurate, reliable, and safer autonomous driving. The code is available at https://github.com/AI-secure/SafeAuto.

View on arXiv PDF Code

Similar