CVAIJul 7, 2025

Losing Control: Data Poisoning Attack on Guided Diffusion via ControlNet

arXiv:2507.04726v11 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This reveals a critical security flaw in open-source ControlNet pipelines, posing risks for users relying on community-shared data, and is incremental as it applies an existing attack type to a new model context.

The paper tackles the vulnerability of ControlNet-guided diffusion models to stealthy data poisoning attacks, where poisoned samples cause the model to generate NSFW images without text triggers while maintaining clean-prompt fidelity, achieving a high attack success rate on large-scale datasets.

Text-to-image diffusion models have achieved remarkable success in translating textual prompts into high-fidelity images. ControlNets further extend these models by allowing precise, image-based conditioning (e.g., edge maps, depth, pose), enabling fine-grained control over structure and style. However, their dependence on large, publicly scraped datasets -- and the increasing use of community-shared data for fine-tuning -- exposes them to stealthy data poisoning attacks. In this work, we introduce a novel data poisoning method that manipulates ControlNets to generate images containing specific content without any text triggers. By injecting poisoned samples -- each pairing a subtly triggered input with an NSFW target -- the model retains clean-prompt fidelity yet reliably produces NSFW outputs when the trigger is present. On large-scale, high-quality datasets, our backdoor achieves high attack success rate while remaining imperceptible in raw inputs. These results reveal a critical vulnerability in open-source ControlNets pipelines and underscore the need for robust data sanitization and defense mechanisms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes