LGMLNov 17, 2019

Black-Box Adversarial Attack with Transferable Model-based Embedding

arXiv:1911.07140v2132 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of query-efficient adversarial attacks for machine learning security, offering incremental improvements over existing methods.

The paper tackles black-box adversarial attacks by learning a low-dimensional embedding from a pretrained model to efficiently search for transferable perturbations, resulting in improved query efficiency and attack success rates across various datasets and defended networks.

We present a new method for black-box adversarial attack. Unlike previous methods that combined transfer-based and scored-based methods by using the gradient or initialization of a surrogate white-box model, this new method tries to learn a low-dimensional embedding using a pretrained model, and then performs efficient search within the embedding space to attack an unknown target network. The method produces adversarial perturbations with high level semantic patterns that are easily transferable. We show that this approach can greatly improve the query efficiency of black-box adversarial attack across different target network architectures. We evaluate our approach on MNIST, ImageNet and Google Cloud Vision API, resulting in a significant reduction on the number of queries. We also attack adversarially defended networks on CIFAR10 and ImageNet, where our method not only reduces the number of queries, but also improves the attack success rate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes