LGJun 17, 2021

Poisoning and Backdooring Contrastive Learning

arXiv:2106.09667v2215 citations
AI Analysis

This reveals a critical security flaw in widely used contrastive learning methods, impacting AI safety and robustness for applications relying on uncurated data.

The paper demonstrates that multimodal contrastive learning models like CLIP are vulnerable to backdoor and poisoning attacks, where poisoning as little as 0.01% of the dataset causes misclassification with a small patch, and targeted attacks require only 0.0001% control.

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of 0.0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes