ROAIMay 20

Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

arXiv:2605.2144624.01 citations
Predicted impact top 72% in RO · last 90 daysOriginality Incremental advance
AI Analysis

For developers of autonomous driving systems, this work establishes reasoning consistency as a quantitative proxy for planning safety, enabling runtime monitoring of VLA models.

This paper evaluates the robustness of a Vision-Language-Action (VLA) driving model under sensor perturbations, finding that reasoning consistency (Chain-of-Causation) strongly correlates with trajectory reliability: when explanations change after perturbation, trajectory deviation increases 5.3× (21.8m vs 4.1m, r=0.99). Enabling CoC generation improves trajectory accuracy by 11.8% on average.

Interpretable autonomous driving planners depend not only on generating explanations, but also on those explanations remaining reliable under real-world sensor degradation. In this paper we present a controlled perturbation study of Vision-Language-Action (VLA) robustness in autonomous driving, evaluating Alpamayo R1 (10B parameters) across 1,996 scenarios under eight sensor perturbations (Gaussian noise at four intensities, two lighting extremes, and two fog levels; ${\sim}18{,}000$ inference trials). We find that reasoning consistency is a high-fidelity indicator of trajectory reliability: when Chain-of-Causation (CoC) explanations change after perturbation, trajectory deviation spikes $5.3{\times}$ (21.8m vs 4.1m), with $r\!=\!0.99$ across attack types and $r_{pb}\!=\!0.53$ per-sample (Cohen's $d\!=\!1.12$). A controlled ablation provides evidence that enabling CoC generation is associated with improved trajectory accuracy (11.8% on average across conditions; $p < 0.0001$) under matched inference settings. Over the tested noise range ($σ\in \{10, 30, 50, 70\}$), degradation is approximately linear ($R^2\!=\!0.957$), while standard input preprocessing defenses provide only marginal relief. Together, these results establish CoC consistency as a quantitative proxy for planning safety and motivate reasoning-based runtime monitoring for safer VLA deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes