CVJan 16, 2025

Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts

arXiv:2501.09833v29 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses robustness issues in deploying concept erasure techniques for real-world applications, highlighting critical side effects.

The paper tackles the problem of concept erasure in text-to-image models degrading non-target concepts, revealing that erasure causes unintended suppression and distortions, with findings based on over 100 concepts in the EraseBench benchmark.

Concept erasure techniques have recently gained significant attention for their potential to remove unwanted concepts from text-to-image models. While these methods often demonstrate promising results in controlled settings, their robustness in real-world applications and suitability for deployment remain uncertain. In this work, we (1) identify a critical gap in evaluating sanitized models, particularly in assessing their performance across diverse concept dimensions, and (2) systematically analyze the failure modes of text-to-image models post-erasure. We focus on the unintended consequences of concept removal on non-target concepts across different levels of interconnected relationships including visually similar, binomial, and semantically related concepts. To address this, we introduce EraseBench, a comprehensive benchmark for evaluating post-erasure performance. EraseBench includes over 100 curated concepts, targeted evaluation prompts, and a robust set of metrics to assess both effectiveness and side effects of erasure. Our findings reveal a phenomenon of concept entanglement, where erasure leads to unintended suppression of non-target concepts, causing spillover degradation that manifests as distortions and a decline in generation quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes