CR CVNov 29, 2023

MMA-Diffusion: MultiModal Attack on Diffusion Models

Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu

arXiv:2311.17516v439.1214 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This work addresses security risks for users and developers of T2I models by highlighting realistic threats, though it is incremental as it builds on prior attack methods.

The paper tackles the problem of generating inappropriate content in Text-to-Image models by introducing MMA-Diffusion, a multimodal attack framework that effectively bypasses current defensive measures in both open-source and commercial services, exposing vulnerabilities in existing safeguards.

In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.

View on arXiv PDF Code

Similar