RO LGMay 19, 2025

Learning collision risk proactively from naturalistic driving data at scale

Yiru Jiao, Simeon C. Calvert, Sander van Cranenburgh, Hans van Lint

arXiv:2505.13556v35.72 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This addresses road safety for autonomous driving systems and traffic management by providing a scalable and context-aware method, though it builds incrementally on existing data-driven approaches.

The study tackled the problem of proactively alerting drivers or automated systems to emerging collisions by introducing the Generalised Surrogate Safety Measure (GSSM), which learns collision risk from naturalistic driving data without labels, achieving an area under the precision-recall curve of 0.9 and a median time advance of 2.6 seconds to prevent potential collisions.

Accurately and proactively alerting drivers or automated systems to emerging collisions is crucial for road safety, particularly in highly interactive and complex urban environments. However, existing approaches to identifying potential collisions either require labour-intensive annotation of sparse risk, struggle to consider varying contextual factors, or are only useful in specific scenarios. To address these limits, this study introduces the Generalised Surrogate Safety Measure (GSSM), a new data-driven approach that learns collision risk exclusively from naturalistic driving without the need for crash or risk labels. GSSM captures the patterns of normal driving and estimates the extent to which a traffic interaction deviates from the norm towards an unsafe state. Diverse data from naturalistic driving, including motion kinematics, weather, lighting, etc., are used to train multiple GSSMs, which are tested with 2,591 reconstructed real-world crashes and near-crashes. These test events are also released here as the largest dataset of its kind to date. A basic GSSM using only instantaneous motion kinematics achieves an area under the precision-recall curve of 0.9 and secures a median time advance of 2.6 seconds to prevent potential collisions. Additional interaction patterns and contextual factors provide further performance gains. Across various types of collision risk scenarios (such as rear-end, merging, and turning interactions), the accuracy and timeliness of GSSM consistently outperforms existing baselines. GSSM therefore establishes a scalable, context-aware, and generalisable foundation for proactively quantifying collision risk in traffic interactions. This can support and facilitate autonomous driving systems, traffic safety assessment, and road emergency management. Code and experiment data are openly accessible at https://github.com/Yiru-Jiao/GSSM.

View on arXiv PDF Code

Similar