CVMar 13, 2023

Erasing Concepts from Diffusion Models

arXiv:2303.07345v3566 citationsh-index: 32
Originality Incremental advance
AI Analysis

This addresses safety concerns in AI-generated content by providing a permanent solution to erase unwanted concepts from diffusion models, though it is incremental as it builds on existing fine-tuning techniques.

The authors tackled the problem of removing specific visual concepts from text-to-image diffusion models to prevent misuse, proposing a fine-tuning method that permanently erases concepts like explicit content or artistic styles, achieving performance on par with existing safety methods.

Motivated by recent advancements in text-to-image diffusion, we study erasure of specific concepts from the model's weights. While Stable Diffusion has shown promise in producing explicit or realistic artwork, it has raised concerns regarding its potential for misuse. We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher. We benchmark our method against previous approaches that remove sexually explicit content and demonstrate its effectiveness, performing on par with Safe Latent Diffusion and censored training. To evaluate artistic style removal, we conduct experiments erasing five modern artists from the network and conduct a user study to assess the human perception of the removed styles. Unlike previous methods, our approach can remove concepts from a diffusion model permanently rather than modifying the output at the inference time, so it cannot be circumvented even if a user has access to model weights. Our code, data, and results are available at https://erasing.baulab.info/

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes