Headless Horseman: Adversarial Attacks on Transfer Learning Models
This addresses security vulnerabilities in transfer learning for practitioners, but it is incremental as it builds on existing adversarial attack methods.
The authors tackled the problem of adversarial attacks on transfer learning models by introducing headless attacks that require only the feature extractor, and a label-blind method that does not need class-label information, resulting in a 40% accuracy drop on a ResNet18 trained on CIFAR10.
Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these \emph{headless attacks}. We first demonstrate successful transfer attacks against a victim network using \textit{only} its feature extractor. This motivates the introduction of a label-blind adversarial attack. This transfer attack method does not require any information about the class-label space of the victim. Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40\%.