CVAILGOct 30, 2024

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

arXiv:2410.23330v221 citationsh-index: 13ACL
Originality Incremental advance
AI Analysis

This addresses the need for efficient data removal in multimodal models like CLIP, which is important for privacy and compliance, though it is incremental as it builds on existing unlearning methods.

The paper tackles the problem of unlearning specific visual-textual associations in the multimodal CLIP model, introducing CLIPErase which effectively forgets designated associations in zero-shot tasks on CIFAR-100 and Flickr30K datasets while preserving performance on retained data.

Machine unlearning (MU) has gained significant attention as a means to remove specific data from trained models without requiring a full retraining process. While progress has been made in unimodal domains like text and image classification, unlearning in multimodal models remains relatively underexplored. In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations, ensuring that unlearning does not compromise model performance. CLIPErase consists of three key modules: a Forgetting Module that disrupts the associations in the forget set, a Retention Module that preserves performance on the retain set, and a Consistency Module that maintains consistency with the original model. Extensive experiments on the CIFAR-100 and Flickr30K datasets across four CLIP downstream tasks demonstrate that CLIPErase effectively forgets designated associations in zero-shot tasks for multimodal samples, while preserving the model's performance on the retain set after unlearning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes