MMA-Diffusion: MultiModal Attack on Diffusion Models
This work addresses security risks for users and developers of T2I models by highlighting realistic threats, though it is incremental as it builds on prior attack methods.
The paper tackles the problem of generating inappropriate content in Text-to-Image models by introducing MMA-Diffusion, a multimodal attack framework that effectively bypasses current defensive measures in both open-source and commercial services, exposing vulnerabilities in existing safeguards.
In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.