ROAICVJul 31, 2025

A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving

arXiv:2507.23540v13 citationsh-index: 62025 3rd International Conference on Foundation and Large Language Models (FLLM)
Originality Incremental advance
AI Analysis

This addresses the problem of fragmented architectures and limited generalization in autonomous driving for improving safety and interpretability, though it appears incremental as it builds on existing LLM and VLA methods.

The paper tackled the challenges of adaptability, robustness, and interpretability in autonomous driving by proposing a unified Perception-Language-Action framework that integrates multi-sensor fusion with an LLM-augmented architecture, achieving superior performance in trajectory tracking, speed prediction, and adaptive planning in an urban intersection scenario with a construction zone.

Autonomous driving systems face significant challenges in achieving human-like adaptability, robustness, and interpretability in complex, open-world environments. These challenges stem from fragmented architectures, limited generalization to novel scenarios, and insufficient semantic extraction from perception. To address these limitations, we propose a unified Perception-Language-Action (PLA) framework that integrates multi-sensor fusion (cameras, LiDAR, radar) with a large language model (LLM)-augmented Vision-Language-Action (VLA) architecture, specifically a GPT-4.1-powered reasoning core. This framework unifies low-level sensory processing with high-level contextual reasoning, tightly coupling perception with natural language-based semantic understanding and decision-making to enable context-aware, explainable, and safety-bounded autonomous driving. Evaluations on an urban intersection scenario with a construction zone demonstrate superior performance in trajectory tracking, speed prediction, and adaptive planning. The results highlight the potential of language-augmented cognitive frameworks for advancing the safety, interpretability, and scalability of autonomous driving systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes