CRAICLCVNov 29, 2024

VLSBench: Unveiling Visual Leakage in Multimodal Safety

arXiv:2411.19939v354 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses a critical evaluation flaw in multimodal safety for AI researchers and developers, though it is incremental as it builds on prior work on safety benchmarks.

The paper identifies Visual Safety Information Leakage (VSIL) in existing multimodal safety benchmarks, where textual queries reveal risky image content, leading to unreliable evaluations of multimodal large language models (MLLMs). They introduce VLSBench, a new benchmark with 2.2k image-text pairs, which challenges models like LLaVA and GPT-4o, and show that textual alignment suffices for scenarios with VSIL, while multimodal alignment is better without it.

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counterintuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs aligned with image text pairs. To explain such a phenomenon, we discover a Visual Safety Information Leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky content in the image has been revealed in the textual query. Thus, MLLMs can easily refuse these sensitive image-text pairs according to textual queries only, leading to unreliable cross-modality safety evaluation of MLLMs. We also conduct a further comparison experiment between textual alignment and multimodal alignment to highlight this drawback. To this end, we construct multimodal Visual Leakless Safety Bench (VLSBench) with 2.2k image-text pairs through an automated data pipeline. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, e.g., LLaVA, Qwen2-VL and GPT-4o. Besides, we empirically compare textual and multimodal alignment methods on VLSBench and find that textual alignment is effective enough for multimodal safety scenarios with VSIL, while multimodal alignment is preferable for safety scenarios without VSIL. Code and data are released under https://github.com/AI45Lab/VLSBench

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes