Targeted Unlearning with Single Layer Unlearning Gradient
This provides a practical solution for targeted unlearning in models like CLIP and Stable Diffusion, though it is incremental as it builds on existing unlearning methods.
The paper tackles the problem of efficiently removing sensitive or unwanted content from trained models by proposing Single Layer Unlearning Gradient (SLUG), which updates only a single critical layer to achieve comparable unlearning performance to existing methods while requiring significantly less computational resources.
Machine unlearning methods aim to remove sensitive or unwanted content from trained models, but typically demand extensive model updates at significant computational cost while potentially degrading model performance on both related and unrelated tasks. We propose Single Layer Unlearning Gradient (SLUG) as an efficient method to unlearn targeted information by updating a single critical layer using a one-time gradient computation. SLUG uses layer importance and gradient alignment metrics to identify the optimal layer for targeted information removal while preserving the model utility. We demonstrate the effectiveness of SLUG for CLIP, Stable Diffusion, and vision-language models (VLMs) in removing concrete (e.g., identities and objects) and abstract concepts (e.g., artistic styles). On the UnlearnCanvas benchmark, SLUG achieves comparable unlearning performance to existing methods while requiring significantly less computational resources. Our proposed approach offers a practical solution for targeted unlearning that is computationally efficient and precise. Our code is available at https://github.com/CSIPlab/SLUG.