ROAIMay 16

Pedestrian-Aware LLM-Driven Behavioral Planning for Autonomous Vehicles

arXiv:2605.1685831.0
Predicted impact top 65% in RO · last 90 daysOriginality Incremental advance
AI Analysis

For autonomous vehicle researchers, this work addresses the challenge of generalizing to out-of-distribution pedestrian behaviors in dense urban environments, offering a more interpretable and safer alternative to RL-based systems.

The paper introduces an LLM-based behavioral planning framework for autonomous vehicles that uses natural-language reasoning to handle unpredictable pedestrian interactions. In zero-shot evaluation, it achieves 68% collision-free success, outperforming deep RL baselines (17.7%), and with few-shot memory reaches 96.0% in single-pedestrian scenarios.

Autonomous Vehicles (AVs) must make reliable decisions in dense urban environments where pedestrian behavior is variable, sometimes abnormal, and often unseen during training. Reinforcement learning (RL)-based AV control systems perform well in structured traffic but struggle to generalize to unpredictable pedestrian interactions and out-of-distribution scenarios. Their reliance on handcrafted rewards and opaque decisions further limits their suitability for safety-critical, pedestrian-rich environments. To address these limitations, we introduce a Large Language Model (LLM)-based decision-making framework for pedestrian-aware behavioral planning. The system converts structured scene observations into natural-language reasoning prompts, enabling the LLM to infer pedestrian intent, anticipate risk, and generate cautious tactical driving decisions. These decisions are executed by a motion planner that ensures smooth, kinematically feasible control. We evaluate the framework in SUMO across multiple pedestrian-interaction scenarios, including unexpected jaywalking, turn-back crossing, hesitation, and bidirectional crossing. In zero-shot evaluation, the LLM-based agent achieves a 68% collision-free success rate, substantially outperforming deep RL baselines (17.7%). With few-shot episodic memory in a single-pedestrian scenario, performance increases to 96.0%, exceeding a custom DQN controller (82.0%). Cross-behavior evaluation further shows that memory derived from turn-back interactions transfers to unseen hesitation and bidirectional crossing scenarios, achieving 82.0% and 90.0% success, respectively. The system consistently initiates earlier responses, maintains wider safety buffers, and produces interpretable, human-aligned decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes