CVCRLGIVDec 10, 2019

Feature Losses for Adversarial Robustness

arXiv:1912.04497v1
AI Analysis

This addresses the vulnerability of deep learning models to adversarial inputs, offering a preprocessing defense that is not trivially bypassed by attackers, though it is incremental in nature.

The paper tackles the problem of adversarial attacks on deep learning models by proposing a defense using denoising autoencoders with perceptual losses on feature maps, achieving close to state-of-the-art results on MNIST and CIFAR10 datasets.

Deep learning has made tremendous advances in computer vision tasks such as image classification. However, recent studies have shown that deep learning models are vulnerable to specifically crafted adversarial inputs that are quasi-imperceptible to humans. In this work, we propose a novel approach to defending adversarial attacks. We employ an input processing technique based on denoising autoencoders as a defense. It has been shown that the input perturbations grow and accumulate as noise in feature maps while propagating through a convolutional neural network (CNN). We exploit the noisy feature maps by using an additional subnetwork to extract image feature maps and train an auto-encoder on perceptual losses of these feature maps. This technique achieves close to state-of-the-art results on defending MNIST and CIFAR10 datasets, but more importantly, shows a new way of employing a defense that cannot be trivially trained end-to-end by the attacker. Empirical results demonstrate the effectiveness of this approach on the MNIST and CIFAR10 datasets on simple as well as iterative LP attacks. Our method can be applied as a preprocessing technique to any off the shelf CNN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes