LGCVMay 22, 2025

When Are Concepts Erased From Diffusion Models?

arXiv:2505.17013v512 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of ensuring reliable concept removal in generative AI models, which is crucial for applications like content moderation and privacy, but the work is incremental as it focuses on evaluation rather than proposing a new erasure method.

The paper tackles the problem of evaluating how thoroughly concept erasure methods remove target concepts from diffusion models, and finds that current approaches often fail to fully erase concepts, highlighting the need for more robust evaluations.

In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) interfering with the model's internal guidance processes, and (ii) reducing the unconditional likelihood of generating the target concept, potentially removing it entirely. To assess whether a concept has been truly erased from the model, we introduce a comprehensive suite of independent probing techniques: supplying visual context, modifying the diffusion trajectory, applying classifier guidance, and analyzing the model's alternative generations that emerge in place of the erased concept. Our results shed light on the value of exploring concept erasure robustness outside of adversarial text inputs, and emphasize the importance of comprehensive evaluations for erasure in diffusion models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes