Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
This addresses security vulnerabilities in machine learning models for applications requiring reliable defenses against data poisoning attacks, representing an incremental improvement over existing methods.
The paper tackles the problem of availability poisons in self-supervised learning (SSL) by showing that SSL often performs poorly against such attacks, and introduces VESPR, a defense that uses adversarial training on supervised learning to improve SSL robustness, boosting ImageNet-100 test accuracies by up to 16%.
Availability poisons exploit supervised learning (SL) algorithms by introducing class-related shortcut features in images such that models trained on poisoned data are useless for real-world datasets. Self-supervised learning (SSL), which utilizes augmentations to learn instance discrimination, is regarded as a strong defense against poisoned data. However, by extending the study of SSL across multiple poisons on the CIFAR-10 and ImageNet-100 datasets, we demonstrate that it often performs poorly, far below that of training on clean data. Leveraging the vulnerability of SL to poison attacks, we introduce adversarial training (AT) on SL to obfuscate poison features and guide robust feature learning for SSL. Our proposed defense, designated VESPR (Vulnerability Exploitation of Supervised Poisoning for Robust SSL), surpasses the performance of six previous defenses across seven popular availability poisons. VESPR displays superior performance over all previous defenses, boosting the minimum and average ImageNet-100 test accuracies of poisoned models by 16% and 9%, respectively. Through analysis and ablation studies, we elucidate the mechanisms by which VESPR learns robust class features.