LGCRCVDec 1, 2024

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

arXiv:2412.00727v27 citationsh-index: 21Has Code
Originality Highly original
AI Analysis

This addresses a critical security vulnerability for users of vision-language models, offering a practical defense against backdoor attacks without requiring expensive retraining.

The paper tackles the problem of backdoor attacks in CLIP models by introducing PAR, a fine-tuning method that effectively removes backdoors while maintaining standard performance, achieving high removal rates across various attacks and encoders.

Vision-Language models like CLIP have been shown to be highly effective at linking visual perception and natural language understanding, enabling sophisticated image-text capabilities, including strong retrieval and zero-shot classification performance. Their widespread use, as well as the fact that CLIP models are trained on image-text pairs from the web, make them both a worthwhile and relatively easy target for backdoor attacks. As training foundational models, such as CLIP, from scratch is very expensive, this paper focuses on cleaning potentially poisoned models via fine-tuning. We first show that existing cleaning techniques are not effective against simple structured triggers used in Blended or BadNet backdoor attacks, exposing a critical vulnerability for potential real-world deployment of these models. Then, we introduce PAR, Perturb and Recover, a surprisingly simple yet effective mechanism to remove backdoors from CLIP models. Through extensive experiments across different encoders and types of backdoor attacks, we show that PAR achieves high backdoor removal rate while preserving good standard performance. Finally, we illustrate that our approach is effective even only with synthetic text-image pairs, i.e. without access to real training data. The code and models are available at https://github.com/nmndeep/PerturbAndRecover.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes