CVSep 4, 2021

Utilizing Adversarial Targeted Attacks to Boost Adversarial Robustness

arXiv:2109.01945v12 citations
AI Analysis

This work addresses adversarial robustness for deep learning models, offering an incremental improvement over existing defenses like adversarial training.

The paper tackles the problem of adversarial attacks degrading deep neural network performance by proposing a defense that uses adversarial targeted attacks based on different label hypotheses to predict labels, resulting in improvements of up to 5.7%, 3.7%, and 0.6% on various benchmarks.

Adversarial attacks have been shown to be highly effective at degrading the performance of deep neural networks (DNNs). The most prominent defense is adversarial training, a method for learning a robust model. Nevertheless, adversarial training does not make DNNs immune to adversarial perturbations. We propose a novel solution by adopting the recently suggested Predictive Normalized Maximum Likelihood. Specifically, our defense performs adversarial targeted attacks according to different hypotheses, where each hypothesis assumes a specific label for the test sample. Then, by comparing the hypothesis probabilities, we predict the label. Our refinement process corresponds to recent findings of the adversarial subspace properties. We extensively evaluate our approach on 16 adversarial attack benchmarks using ResNet-50, WideResNet-28, and a2-layer ConvNet trained with ImageNet, CIFAR10, and MNIST, showing a significant improvement of up to 5.7%, 3.7%, and 0.6% respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes