Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models
This addresses the need for safer text-to-image generation by providing a practical solution for erasing harmful or unwanted concepts, though it is incremental as it builds on existing erasure methods.
The paper tackles the problem of concept erasure in text-to-image diffusion models to mitigate harmful content, introducing Semantic Surgery, a training-free, zero-shot framework that dynamically neutralizes target concepts in prompts, achieving superior results such as 93.58 H-score in object erasure and reducing explicit content to 1 instance.
Concept erasure in text-to-image diffusion models is crucial for mitigating harmful content, yet existing methods often compromise generative quality. We introduce Semantic Surgery, a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process. It dynamically estimates the presence of target concepts in a prompt and performs a calibrated vector subtraction to neutralize their influence at the source, enhancing both erasure completeness and locality. The framework includes a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence. As a training-free method, Semantic Surgery adapts dynamically to each prompt, ensuring precise interventions. Extensive experiments on object, explicit content, artistic style, and multi-celebrity erasure tasks show our method significantly outperforms state-of-the-art approaches. We achieve superior completeness and robustness while preserving locality and image quality (e.g., 93.58 H-score in object erasure, reducing explicit content to just 1 instance, and 8.09 H_a in style erasure with no quality degradation). This robustness also allows our framework to function as a built-in threat detection system, offering a practical solution for safer text-to-image generation.