LG AI CLDec 28, 2024

AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors

Mengnan Zhao, Lihe Zhang, Xingyi Yang, Tianhang Zheng, Baocai Yin

arXiv:2501.00054v116.98 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses security concerns in text-to-image diffusion models by improving unlearning of inappropriate concepts, though it is an incremental advancement over existing fine-tuning methods.

The paper tackles the performance trade-off in diffusion model unlearning by proposing AdvAnchor, which uses adversarial anchors to closely resemble undesirable concept embeddings while excluding their defining attributes, achieving state-of-the-art results in experiments.

Security concerns surrounding text-to-image diffusion models have driven researchers to unlearn inappropriate concepts through fine-tuning. Recent fine-tuning methods typically align the prediction distributions of unsafe prompts with those of predefined text anchors. However, these techniques exhibit a considerable performance trade-off between eliminating undesirable concepts and preserving other concepts. In this paper, we systematically analyze the impact of diverse text anchors on unlearning performance. Guided by this analysis, we propose AdvAnchor, a novel approach that generates adversarial anchors to alleviate the trade-off issue. These adversarial anchors are crafted to closely resemble the embeddings of undesirable concepts to maintain overall model performance, while selectively excluding defining attributes of these concepts for effective erasure. Extensive experiments demonstrate that AdvAnchor outperforms state-of-the-art methods. Our code is publicly available at https://anonymous.4open.science/r/AdvAnchor.

View on arXiv PDF

Similar