CRAIMay 18

DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs

arXiv:2605.1891517.2
Predicted impact top 28% in CR · last 90 daysOriginality Highly original
AI Analysis

It exposes critical safety vulnerabilities in multimodal LLMs that support multi-image inputs, which are less aligned for such attacks, posing a security risk for developers and users.

The paper proposes DMN, a compositional jailbreak framework that uses distributed instructions, multimodal evidence, and a number chain task to attack multimodal LLMs with multi-image inputs, achieving over 90% attack success rates on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4, significantly outperforming prior methods.

Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks, which can elicit harmful responses from MLLMs. Many MLLMs support multi-image inputs, inadvertently introducing new vulnerabilities due to less efforts on multi-image safety alignment. Previous MLLM jailbreak methods only uses a single image, which restricts the attack space: they cannot distribute harmful requests across multiple images, carry abundant information, or exploit additional visual reasoning tasks to distract MLLMs. To address these limitations, in this paper, we propose a compositional jailbreak framework, \textbf{DMN}, which leverages \textbf{D}istributed instruction, \textbf{M}ultimodal evidence and a \textbf{N}umber chain task to fully enhance the jailbreak performance. Extensive experiments show that DMN is highly effective for MLLM jailbreaking, e.g. achieving attack success rates of over 90\% on GPT-4o, Gemini-2.5-pro and Claude Sonnet 4, surpassing other baselines by a large margin. This compositional, multi-image jailbreak strategy reveals fundamental weaknesses in their safety mechanisms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes