NE AI CVMar 28, 2017

Adversarial Transformation Networks: Learning to Generate Adversarial Examples

arXiv:1703.09387v135.6305 citations

Originality Highly original

AI Analysis

This addresses the challenge of efficiently creating adversarial examples for machine learning security, offering a novel approach that is faster and more diverse than existing methods.

The paper tackles the problem of generating adversarial examples to attack deep neural networks by introducing Adversarial Transformation Networks (ATNs), which are fast, self-supervised feed-forward networks that produce diverse adversarial outputs, achieving effective attacks on MNIST classifiers and the state-of-the-art ImageNet classifier Inception ResNet v2.

Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides exceptional diversity of output. We efficiently train feed-forward neural networks in a self-supervised manner to generate adversarial examples against a target network or set of networks. We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier's outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as well as the latest state-of-the-art ImageNet classifier Inception ResNet v2.

View on arXiv PDF

Similar