CVCRNov 16, 2021

Detecting AutoAttack Perturbations in the Frequency Domain

arXiv:2111.08785v315 citations
Originality Incremental advance
AI Analysis

This provides an alternative defense for image classification networks against adversarial attacks, focusing on detection rather than network hardening, which is incremental but offers high accuracy.

The paper tackles the problem of defending against AutoAttack adversarial perturbations by proposing detection algorithms in the frequency domain, achieving 100% detection accuracy on CIFAR10 and up to 99.3% on ImageNet for epsilon = 8/255.

Recently, adversarial attacks on image classification networks by the AutoAttack (Croce and Hein, 2020b) framework have drawn a lot of attention. While AutoAttack has shown a very high attack success rate, most defense approaches are focusing on network hardening and robustness enhancements, like adversarial training. This way, the currently best-reported method can withstand about 66% of adversarial examples on CIFAR10. In this paper, we investigate the spatial and frequency domain properties of AutoAttack and propose an alternative defense. Instead of hardening a network, we detect adversarial attacks during inference, rejecting manipulated inputs. Based on a rather simple and fast analysis in the frequency domain, we introduce two different detection algorithms. First, a black box detector that only operates on the input images and achieves a detection accuracy of 100% on the AutoAttack CIFAR10 benchmark and 99.3% on ImageNet, for epsilon = 8/255 in both cases. Second, a whitebox detector using an analysis of CNN feature maps, leading to a detection rate of also 100% and 98.7% on the same benchmarks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes