Can Attention Masks Improve Adversarial Robustness?
This addresses the vulnerability of DNNs to adversarial attacks, which is a critical security issue for AI systems, but the approach is incremental as it builds on prior work on pixel discretization.
The paper tackles the problem of adversarial robustness in deep neural networks by hypothesizing that eliminating image backgrounds via attention masks improves robustness, and finds that using attention masks leads to over 20% increased adversarial robustness on MS-COCO.
Deep Neural Networks (DNNs) are known to be susceptible to adversarial examples. Adversarial examples are maliciously crafted inputs that are designed to fool a model, but appear normal to human beings. Recent work has shown that pixel discretization can be used to make classifiers for MNIST highly robust to adversarial examples. However, pixel discretization fails to provide significant protection on more complex datasets. In this paper, we take the first step towards reconciling these contrary findings. Focusing on the observation that discrete pixelization in MNIST makes the background completely black and foreground completely white, we hypothesize that the important property for increasing robustness is the elimination of image background using attention masks before classifying an object. To examine this hypothesis, we create foreground attention masks for two different datasets, GTSRB and MS-COCO. Our initial results suggest that using attention mask leads to improved robustness. On the adversarially trained classifiers, we see an adversarial robustness increase of over 20% on MS-COCO.