CL AIMay 23, 2025

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Youliang Yuan, Wenxiang Jiao, Yuejin Xie, Chihao Shen, Menghan Tian, Wenxuan Wang, Jen-tse Huang, Pinjia He

Peking UTencent

arXiv:2505.17455v20.126 citationsh-index: 26Has Code

AI Analysis55

This work addresses the need for safer AI assistants that can actively prevent harm, though it is incremental as it focuses on benchmarking rather than developing new methods.

The paper tackles the problem of evaluating proactive risk awareness in multimodal language models by introducing the Proactive Safety Bench (PaSBench), a benchmark with 416 multimodal scenarios across 5 safety-critical domains, and finds that top models like Gemini-2.5-pro achieve only 71% image and 64% text accuracy, missing 45-55% of risks in repeated trials.

Human safety awareness gaps often prevent the timely recognition of everyday risks. In solving this problem, a proactive safety artificial intelligence (AI) system would work better than a reactive one. Instead of just reacting to users' questions, it would actively watch people's behavior and their environment to detect potential dangers in advance. Our Proactive Safety Bench (PaSBench) evaluates this capability through 416 multimodal scenarios (128 image sequences, 288 text logs) spanning 5 safety-critical domains. Evaluation of 36 advanced models reveals fundamental limitations: Top performers like Gemini-2.5-pro achieve 71% image and 64% text accuracy, but miss 45-55% risks in repeated trials. Through failure analysis, we identify unstable proactive reasoning rather than knowledge deficits as the primary limitation. This work establishes (1) a proactive safety benchmark, (2) systematic evidence of model limitations, and (3) critical directions for developing reliable protective AI. We believe our dataset and findings can promote the development of safer AI assistants that actively prevent harm rather than merely respond to requests. Our dataset can be found at https://huggingface.co/datasets/Youliang/PaSBench.

View on arXiv PDF

Similar