CV AI CLMay 21, 2025

Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition

Dasol Choi, Seunghyun Lee, Youngsook Song

arXiv:2505.15367v33 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the problem of unreliable VLM performance in emergency recognition for safety-critical applications, highlighting a systematic bias that is incremental in nature.

The study tackled the reliability of Vision-Language Models (VLMs) in safety-critical scenarios by introducing the VERI benchmark, revealing an 'overreaction problem' where models misclassified 31-96% of safe situations as dangerous with high recall but low precision.

Vision-Language Models (VLMs) have shown capabilities in interpreting visual content, but their reliability in safety-critical scenarios remains insufficiently explored. We introduce VERI, a diagnostic benchmark comprising 200 synthetic images (100 contrastive pairs) and an additional 50 real-world images (25 pairs) for validation. Each emergency scene is paired with a visually similar but safe counterpart through human verification. Using a two-stage evaluation protocol (risk identification and emergency response), we assess 17 VLMs across medical emergencies, accidents, and natural disasters. Our analysis reveals an "overreaction problem": models achieve high recall (70-100%) but suffer from low precision, misclassifying 31-96% of safe situations as dangerous. Seven safe scenarios were universally misclassified by all models. This "better-safe-than-sorry" bias stems from contextual overinterpretation (88-98% of errors). Both synthetic and real-world datasets confirm these systematic patterns, challenging VLM reliability in safety-critical applications. Addressing this requires enhanced contextual reasoning in ambiguous visual situations.

View on arXiv PDF

Similar