CVCRLGApr 16, 2019

AT-GAN: An Adversarial Generator Model for Non-constrained Adversarial Examples

arXiv:1904.07793v441 citations
Originality Highly original
AI Analysis

This work addresses a key limitation in adversarial machine learning by enabling non-constrained attacks, which could impact security evaluations for AI systems, though it is incremental in advancing beyond perturbation-based methods.

The paper tackles the problem of generating adversarial examples without input constraints by proposing AT-GAN, a framework that learns the distribution of adversarial examples from scratch, resulting in more realistic examples and higher attack success rates against adversarially trained models.

Despite the rapid development of adversarial machine learning, most adversarial attack and defense researches mainly focus on the perturbation-based adversarial examples, which is constrained by the input images. In comparison with existing works, we propose non-constrained adversarial examples, which are generated entirely from scratch without any constraint on the input. Unlike perturbation-based attacks, or the so-called unrestricted adversarial attack which is still constrained by the input noise, we aim to learn the distribution of adversarial examples to generate non-constrained but semantically meaningful adversarial examples. Following this spirit, we propose a novel attack framework called AT-GAN (Adversarial Transfer on Generative Adversarial Net). Specifically, we first develop a normal GAN model to learn the distribution of benign data, and then transfer the pre-trained GAN model to estimate the distribution of adversarial examples for the target model. In this way, AT-GAN can learn the distribution of adversarial examples that is very close to the distribution of real data. To our knowledge, this is the first work of building an adversarial generator model that could produce adversarial examples directly from any input noise. Extensive experiments and visualizations show that the proposed AT-GAN can very efficiently generate diverse adversarial examples that are more realistic to human perception. In addition, AT-GAN yields higher attack success rates against adversarially trained models under white-box attack setting and exhibits moderate transferability against black-box models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes