CVCLNov 14, 2024

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

arXiv:2411.09259v231 citationsh-index: 10Has Code
Originality Synthesis-oriented
AI Analysis

It addresses security risks in deploying multimodal generative models, particularly in sensitive applications, but is incremental as it synthesizes existing research rather than introducing new methods.

This survey tackles the problem of jailbreak attacks on multimodal generative models, which can bypass safety mechanisms to produce harmful content, and reviews existing attack methods and defense strategies across various levels and modalities.

The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output. Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models. Additionally, we cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any within generative systems. Finally, we highlight current research challenges and propose potential directions for future research. The open-source repository corresponding to this work can be found at https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes