NECVLGDec 1, 2016

Adversarial Images for Variational Autoencoders

arXiv:1612.00155v191 citations
Originality Incremental advance
AI Analysis

This work addresses the vulnerability of autoencoders to adversarial attacks, which is an incremental contribution to understanding robustness in unsupervised learning models.

The paper investigates adversarial attacks on autoencoders, proposing a method to distort input images to produce reconstructions of a different target image, and finds that autoencoders are more robust than classifiers, with a quasi-linear trade-off between input distortion and similarity to the target, as tested on MNIST and SVHN datasets.

We investigate adversarial attacks for autoencoders. We propose a procedure that distorts the input image to mislead the autoencoder in reconstructing a completely different target image. We attack the internal latent representations, attempting to make the adversarial input produce an internal representation as similar as possible as the target's. We find that autoencoders are much more robust to the attack than classifiers: while some examples have tolerably small input distortion, and reasonable similarity to the target image, there is a quasi-linear trade-off between those aims. We report results on MNIST and SVHN datasets, and also test regular deterministic autoencoders, reaching similar conclusions in all cases. Finally, we show that the usual adversarial attack for classifiers, while being much easier, also presents a direct proportion between distortion on the input, and misdirection on the output. That proportionality however is hidden by the normalization of the output, which maps a linear layer into non-linear probabilities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes