From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning

Ahmed Y. Gado, Omar Y. Goba, Alaa Hassanein, Catherine M. Elias, Ahmed Hussein

arXiv:2605.198241.1

AI Analysis

For autonomous vehicle researchers using LLMs, this work clarifies the limits of prompt-based temporal grounding, establishing the first empirical benchmark for temporal scene-to-plan reasoning.

The paper investigates whether temporal conditioning in LLM/LMM-based agent communication improves reasoning coherence for autonomous vehicle planning. Results show no statistically significant improvement in standard NLP metrics, but qualitative analysis reveals predictive hazard reasoning and stable corrective behavior.

Recent attempts to support high-level scene interpretation and planning in Autonomous Vehicles (AVs) using ensembles of Large Language Models (LLMs) and Large Multimodal Models (LMMs) continue to treat time as a secondary property. This lack of temporal grounding leads to inconsistencies in reasoning about continuous actions, undermining both safety and interpretability. This work explores whether temporal conditioning within inter-agent communication can preserve or enhance coherence without introducing degradation in semantic or logical consistency. To investigate this, we introduce three planner architectures with progressively increasing temporal integration and evaluate them on curated subsets of the BDD-X dataset using semantic, syntactic, and logical metrics. Results show that while temporal conditioning reshapes reasoning style, it yields no statistically significant improvements in standard NLP-based correctness metrics. However, qualitative analysis reveals predictive hazard reasoning, stable corrective behavior, and strategic divergence in the Sentinel. These findings clarify the limits of prompt-based temporal grounding and establish the first empirical benchmark for temporal scene-to-plan reasoning.

View on arXiv PDF

Similar