Forgetting is Competition: Rethinking Unlearning as Representation Interference in Diffusion Models
This addresses challenges in copyright compliance, protected data mitigation, artist opt-outs, and policy-driven content updates for users of diffusion models, though it is incremental as it builds on existing unlearning methods.
The paper tackled the problem of uneven concept removal and unintended forgetting in text-to-image diffusion models during unlearning, introducing SurgUn, a surgical method that applies targeted weight-space updates to remove specific visual concepts while preserving generative quality, achieving high-precision unlearning across models like Stable Diffusion v1.5, SDXL, and SANA.
Unlearning in text-to-image diffusion models often leads to uneven concept removal and unintended forgetting of unrelated capabilities. This complicates tasks such as copyright compliance, protected data mitigation, artist opt-outs, and policy-driven content updates. As models grow larger and adopt more diverse architectures, achieving precise and selective unlearning while preserving generative quality becomes increasingly challenging. We introduce SurgUn (pronounced as Surgeon), a surgical unlearning method that applies targeted weight-space updates to remove specific visual concepts in text-conditioned diffusion models. Our approach is motivated by retroactive interference theory, which holds that newly acquired memories can overwrite, suppress, or impede access to prior ones by competing for shared representational pathways. We adapt this principle to diffusion models by inducing retroactive concept interference, enabling focused destabilization of only the target concept while preserving unrelated capabilities through a novel training paradigm. SurgUn achieves high-precision unlearning across diverse settings. It performs strongly on compact U-Net based models such as Stable Diffusion v1.5, scales effectively to the larger U-Net architecture SDXL, and extends to SANA, representing an underexplored Diffusion Transformer based architecture for unlearning.