Generating Adversarial Attacks in the Latent Space
This addresses the challenge of creating imperceptible adversarial examples for deep learning networks, but it is incremental as it builds on existing adversarial attack methods by shifting to the latent space.
The paper tackled the problem of generating adversarial attacks by proposing to inject perturbations in the latent space using a generative adversarial network, eliminating the need for margin-based priors, and demonstrated effectiveness across multiple datasets with high visual realism compared to pixel-based methods.
Adversarial attacks in the input (pixel) space typically incorporate noise margins such as $L_1$ or $L_{\infty}$-norm to produce imperceptibly perturbed data that confound deep learning networks. Such noise margins confine the magnitude of permissible noise. In this work, we propose injecting adversarial perturbations in the latent (feature) space using a generative adversarial network, removing the need for margin-based priors. Experiments on MNIST, CIFAR10, Fashion-MNIST, CIFAR100 and Stanford Dogs datasets support the effectiveness of the proposed method in generating adversarial attacks in the latent space while ensuring a high degree of visual realism with respect to pixel-based adversarial attack methods.