RO AISep 23, 2025

The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving

Jay Patrikar, Apoorva Sharma, Sushant Veer, Boyi Li, Sebastian Scherer, Marco Pavone

arXiv:2509.18626v17.82 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses safety-critical decision-making in autonomous driving by leveraging unstructured crash data, though it is incremental in applying retrieval and counterfactual reasoning to a known domain.

The paper tackles the problem of training autonomous driving systems near safety boundaries by using crash reports to provide contrastive evidence, resulting in improved calibration with recall on contextually preferred actions increasing from 24% to 53% on a nuScenes benchmark.

Learning-based autonomous driving systems are trained mostly on incident-free data, offering little guidance near safety-performance boundaries. Real crash reports contain precisely the contrastive evidence needed, but they are hard to use: narratives are unstructured, third-person, and poorly grounded to sensor views. We address these challenges by normalizing crash narratives to ego-centric language and converting both logs and crashes into a unified scene-action representation suitable for retrieval. At decision time, our system adjudicates proposed actions by retrieving relevant precedents from this unified index; an agentic counterfactual extension proposes plausible alternatives, retrieves for each, and reasons across outcomes before deciding. On a nuScenes benchmark, precedent retrieval substantially improves calibration, with recall on contextually preferred actions rising from 24% to 53%. The counterfactual variant preserves these gains while sharpening decisions near risk.

View on arXiv PDF

Similar