World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving
This work provides a robust solution for safety-critical autonomous driving applications by improving accident anticipation through generative scene augmentation and adaptive temporal reasoning.
The paper tackled the problem of anticipating traffic accidents in autonomous driving by addressing data scarcity and missing object-level cues, resulting in enhanced accuracy and lead time for accident anticipation as confirmed by experiments on public and new datasets.
Reliable anticipation of traffic accidents is essential for advancing autonomous driving systems. However, this objective is limited by two fundamental challenges: the scarcity of diverse, high-quality training data and the frequent absence of crucial object-level cues due to environmental disruptions or sensor deficiencies. To tackle these issues, we propose a comprehensive framework combining generative scene augmentation with adaptive temporal reasoning. Specifically, we develop a video generation pipeline that utilizes a world model guided by domain-informed prompts to create high-resolution, statistically consistent driving scenarios, particularly enriching the coverage of edge cases and complex interactions. In parallel, we construct a dynamic prediction model that encodes spatio-temporal relationships through strengthened graph convolutions and dilated temporal operators, effectively addressing data incompleteness and transient visual noise. Furthermore, we release a new benchmark dataset designed to better capture diverse real-world driving risks. Extensive experiments on public and newly released datasets confirm that our framework enhances both the accuracy and lead time of accident anticipation, offering a robust solution to current data and modeling limitations in safety-critical autonomous driving applications.