Meta Adversarial Perturbations
This addresses the computational inefficiency of iterative adversarial attacks for practitioners in machine learning security, offering a faster method to generate strong attacks.
The paper tackles the problem of generating adversarial perturbations efficiently by introducing meta adversarial perturbations (MAP), which serve as better initializations that cause natural images to be misclassified with high probability after only a one-step gradient update, demonstrating vulnerability in state-of-the-art deep neural networks with high generalization across data and models.
A plethora of attack methods have been proposed to generate adversarial examples, among which the iterative methods have been demonstrated the ability to find a strong attack. However, the computation of an adversarial perturbation for a new data point requires solving a time-consuming optimization problem from scratch. To generate a stronger attack, it normally requires updating a data point with more iterations. In this paper, we show the existence of a meta adversarial perturbation (MAP), a better initialization that causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update, and propose an algorithm for computing such perturbations. We conduct extensive experiments, and the empirical results demonstrate that state-of-the-art deep neural networks are vulnerable to meta perturbations. We further show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.