CVJan 11, 2024

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi

arXiv:2401.05779v424.541 citationsh-index: 29Has CodeCVPR

Originality Incremental advance

AI Analysis

This addresses safety concerns in image generation for users of diffusion models, though it is incremental as it builds on existing mitigation techniques.

The paper tackles the problem of unintentional NSFW content generation in diffusion models by introducing EraseDiff, an algorithm that removes unwanted information while preserving model utility, achieving an optimal trade-off as demonstrated in experiments.

Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content. Although various techniques have been proposed to mitigate unwanted influences in diffusion models while preserving overall performance, achieving a balance between these goals remains challenging. In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten. Our approach formulates this task as a constrained optimization problem using the value function, resulting in a natural first-order algorithm for solving the optimization problem. By altering the generative process to deviate away from the ground-truth denoising trajectory, we update parameters for preservation while controlling constraint reduction to ensure effective erasure, striking an optimal trade-off. Extensive experiments and thorough comparisons with state-of-the-art algorithms demonstrate that EraseDiff effectively preserves the model's utility, efficacy, and efficiency.

View on arXiv PDF Code

Similar