CR AI CL CVNov 29, 2024

VLSBench: Unveiling Visual Leakage in Multimodal Safety

Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao

arXiv:2411.19939v327.961 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses a critical evaluation flaw in multimodal safety for AI researchers and developers, though it is incremental as it builds on prior work on safety benchmarks.

The paper identifies Visual Safety Information Leakage (VSIL) in existing multimodal safety benchmarks, where textual queries reveal risky image content, leading to unreliable evaluations of multimodal large language models (MLLMs). They introduce VLSBench, a new benchmark with 2.2k image-text pairs, which challenges models like LLaVA and GPT-4o, and show that textual alignment suffices for scenarios with VSIL, while multimodal alignment is better without it.

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counterintuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs aligned with image text pairs. To explain such a phenomenon, we discover a Visual Safety Information Leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky content in the image has been revealed in the textual query. Thus, MLLMs can easily refuse these sensitive image-text pairs according to textual queries only, leading to unreliable cross-modality safety evaluation of MLLMs. We also conduct a further comparison experiment between textual alignment and multimodal alignment to highlight this drawback. To this end, we construct multimodal Visual Leakless Safety Bench (VLSBench) with 2.2k image-text pairs through an automated data pipeline. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, e.g., LLaVA, Qwen2-VL and GPT-4o. Besides, we empirically compare textual and multimodal alignment methods on VLSBench and find that textual alignment is effective enough for multimodal safety scenarios with VSIL, while multimodal alignment is preferable for safety scenarios without VSIL. Code and data are released under https://github.com/AI45Lab/VLSBench

View on arXiv PDF Code

Similar