Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing
This work highlights a new attack vector for users of generative image editing services, though the threat is specific to multi-turn workflows and the success rates are moderate.
This paper identifies a security vulnerability in multi-turn image generation workflows where a hidden hint (e.g., a logo) embedded in an input image can be re-rendered by downstream generative models. The authors propose two attack scenarios (phishing-based and poison-based) achieving success rates of 44.4% and 32.2% respectively, and develop a mitigation solution with 87.4% and 92.3% success rates.
With the rapid advancement of generative AI, users increasingly rely on image-generation models for image design and creation. To achieve faithful outputs, users typically engage in multi-turn interactions during image refinement: a text-to-image generation phase followed by a text-guided image-to-image editing phase. In this paper, we investigate a novel security vulnerability associated with such a workflow. Our key insight is that a nearly invisible hint, like branding information (e.g., a logo), embedded in an input image can be recognized by downstream generative models and subsequently re-rendered onto semantically related objects, even when the user prompt does not explicitly mention it. This form of hidden payload injection makes the attack stealthy. We study two realistic attack scenarios. The first is a phishing-based setting, in which an attacker controls an online image generation service and injects hidden content into generated images before they are returned to users. The second is a poison-based setting, where an attacker distributes a compromised text-to-image diffusion model whose output contains hidden content. We evaluate both attacks using six injected payloads, including well-known logos and customized designs, and demonstrate that the two attacks can achieve success rates of 44.4% and 32.2% on average, respectively, while ensuring the injected logos are visually imperceptible. We also develop a mitigation solution that achieves an average success rate of 87.4% and 92.3% against the phishing-based and poison-based attacks, respectively.