CV AI CRApr 8, 2025

Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking

arXiv:2504.05838v16.25 citationsh-index: 9Has CodeCVPR

Originality Highly original

AI Analysis

This exposes a security vulnerability in widely used AI image generation systems, posing risks to service providers and users through scalable deception.

The paper reveals that integrating Image Prompt Adapters (IP-Adapters) into text-to-image diffusion models enables a new hijacking attack, where adversaries can use imperceptible adversarial examples to jailbreak image generation services and discredit providers, with extensive experiments confirming feasibility.

Recently, the Image Prompt Adapter (IP-Adapter) has been increasingly integrated into text-to-image diffusion models (T2I-DMs) to improve controllability. However, in this paper, we reveal that T2I-DMs equipped with the IP-Adapter (T2I-IP-DMs) enable a new jailbreak attack named the hijacking attack. We demonstrate that, by uploading imperceptible image-space adversarial examples (AEs), the adversary can hijack massive benign users to jailbreak an Image Generation Service (IGS) driven by T2I-IP-DMs and mislead the public to discredit the service provider. Worse still, the IP-Adapter's dependency on open-source image encoders reduces the knowledge required to craft AEs. Extensive experiments verify the technical feasibility of the hijacking attack. In light of the revealed threat, we investigate several existing defenses and explore combining the IP-Adapter with adversarially trained models to overcome existing defenses' limitations. Our code is available at https://github.com/fhdnskfbeuv/attackIPA.

View on arXiv PDF Code

Similar