CV AIMar 9, 2025

Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving

Enming Zhang, Peizhe Gong, Xingyuan Dai, Min Huang, Yisheng Lv, Qinghai Miao

arXiv:2503.06497v37 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses safety cognition for autonomous driving systems, but it is incremental as it focuses on evaluation and dataset creation rather than a new model or paradigm.

The authors tackled the lack of safety-critical evaluation for vision-language models in autonomous driving by introducing SCD-Bench, a novel framework, and SCD-Training, a dataset of 324.35K samples, resulting in models showing marked improvements on safety and other benchmarks.

Ensuring the safety of vision-language models (VLMs) in autonomous driving systems is of paramount importance, yet existing research has largely focused on conventional benchmarks rather than safety-critical evaluation. In this work, we present SCD-Bench (Safety Cognition Driving Benchmark) a novel framework specifically designed to assess the safety cognition capabilities of VLMs within interactive driving scenarios. To address the scalability challenge of data annotation, we introduce ADA (Autonomous Driving Annotation), a semi-automated labeling system, further refined through expert review by professionals with domain-specific knowledge in autonomous driving. To facilitate scalable and consistent evaluation, we also propose an automated assessment pipeline leveraging large language models, which demonstrates over 98% agreement with human expert judgments. In addressing the broader challenge of aligning VLMs with safety cognition in driving environments, we construct SCD-Training, the first large-scale dataset tailored for this task, comprising 324.35K high-quality samples. Through extensive experiments, we show that models trained on SCD-Training exhibit marked improvements not only on SCD-Bench, but also on general and domain-specific benchmarks, offering a new perspective on enhancing safety-aware interactions in vision-language systems for autonomous driving.

View on arXiv PDF

Similar