Semantic Preserving Adversarial Attack Generation with Autoencoder and Genetic Algorithm
This addresses robustness issues in deep learning models for security applications, but it is incremental as it builds on existing attack methods with a semantic focus.
The paper tackles the problem of adversarial attacks breaking data semantics by proposing a black-box attack that modifies latent features using an autoencoder and measures noise in semantic space, achieving a 100% attack success rate on MNIST and CIFAR-10 datasets with less perturbation than FGSM.
Widely used deep learning models are found to have poor robustness. Little noises can fool state-of-the-art models into making incorrect predictions. While there is a great deal of high-performance attack generation methods, most of them directly add perturbations to original data and measure them using L_p norms; this can break the major structure of data, thus, creating invalid attacks. In this paper, we propose a black-box attack, which, instead of modifying original data, modifies latent features of data extracted by an autoencoder; then, we measure noises in semantic space to protect the semantics of data. We trained autoencoders on MNIST and CIFAR-10 datasets and found optimal adversarial perturbations using a genetic algorithm. Our approach achieved a 100% attack success rate on the first 100 data of MNIST and CIFAR-10 datasets with less perturbation than FGSM.