CLDec 21, 2024

Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

arXiv:2412.16555v32 citations
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in LLMs for developers and users, though it is incremental as it builds on existing jailbreaking methods.

The paper tackles the problem of jailbreaking attacks on large language models by proposing a hybrid multimodal method called JMLLM, which achieves advanced attack success rates and significantly reduces time overhead across 13 popular models.

Large language models (LLMs) are widely applied in various fields of society due to their powerful reasoning, understanding, and generation capabilities. However, the security issues associated with these models are becoming increasingly severe. Jailbreaking attacks, as an important method for detecting vulnerabilities in LLMs, have been explored by researchers who attempt to induce these models to generate harmful content through various attack methods. Nevertheless, existing jailbreaking methods face numerous limitations, such as excessive query counts, limited coverage of jailbreak modalities, low attack success rates, and simplistic evaluation methods. To overcome these constraints, this paper proposes a multimodal jailbreaking method: JMLLM. This method integrates multiple strategies to perform comprehensive jailbreak attacks across text, visual, and auditory modalities. Additionally, we contribute a new and comprehensive dataset for multimodal jailbreaking research: TriJail, which includes jailbreak prompts for all three modalities. Experiments on the TriJail dataset and the benchmark dataset AdvBench, conducted on 13 popular LLMs, demonstrate advanced attack success rates and significant reduction in time overhead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes