Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation

Nicanor Mayumu, Xiaoheng Deng, Patrick Mukala

arXiv:2605.1726851.6

Predicted impact top 68% in AI · last 90 daysOriginality Incremental advance

AI Analysis

It identifies critical safety issues in VLA driving models for autonomous driving, showing that their reasoning is often unfaithful and unreliable.

This paper systematically studies faithfulness in Vision-Language-Action (VLA) driving models, finding that output rationales are significantly unfaithful: reasoning fidelity is only 42.5%, with 94 missed pedestrians in one-third of scenes, 97.7% trajectory fragility under mild perturbations, and 48.3% reasoning-action consistency.

We present the first systematic study of faithfulness in Vision-Language-Action (VLA) driving models, analyzing 300 Alpamayo-R1-10B inferences across 100 diverse PhysicalAI-AV scenarios. Our main finding is that output natural-language rationales with trajectories may be significantly unfaithful: (i) overall reasoning fidelity is only 42.5%, with Chain-of-Causation matching scene reality less than half the time; (ii) 94 missed pedestrians in one-third of pedestrian-relevant scenes; (iii) 97.7% trajectory fragility under mild visual perturbations; and (iv) only 48.3% mean reasoning-action consistency, with 53.3% of inferences exhibiting low consistency, including 37.9% of stop-claimed cases where the model continues instead. We formalize faithfulness information-theoretically, define entity and action fidelity with verification criteria, and outline a four-component safety architecture aligned with these results.

View on arXiv PDF

Similar