LGAICVJan 31, 2025

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

arXiv:2501.18950v329 citationsh-index: 24Has CodeICLR
Originality Highly original
AI Analysis

This addresses the risk of harmful content in AI-generated media, offering an incremental improvement over existing erasure techniques.

The paper tackles the problem of harmful content generation in diffusion models by improving concept erasure, showing that fixed-target strategies are suboptimal and proposing an adaptive method that dynamically selects optimal targets, which significantly outperforms state-of-the-art methods in preserving unrelated concepts while maintaining effective erasure.

Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {https://github.com/tuananhbui89/Adaptive-Guided-Erasure}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes