CRLGMLApr 29, 2020

Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability

arXiv:2004.14861v1104 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of blackbox adversarial attacks for security and robustness in AI systems, representing an incremental improvement in attack strategies.

The paper tackles the problem of targeted adversarial attacks on deep neural network image classifiers by perturbing feature representations across multiple layers to improve transferability between models. It achieves state-of-the-art performance, with up to a 10x increase in targeted success rate compared to other methods under relaxed assumptions.

We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs. We also show the superiority of our feature space methods under a relaxation of the common assumption that the source and target models are trained on the same dataset and label space, in some instances achieving a $10\times$ increase in targeted success rate relative to other blackbox transfer methods. Finally, we analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes