Localized Adversarial Training for Increased Accuracy and Robustness in Image Classification
This work addresses the issue of adversarial robustness in image classification for machine learning practitioners, but it is incremental as it builds on existing adversarial training methods with a localized approach.
The paper tackles the problem of image classifiers being vulnerable to adversarial examples by developing a localized adversarial attack that alters image backgrounds, and uses it in a new training technique to create a classifier with reduced accuracy loss on natural images and increased robustness against adversarial inputs, as shown on MNIST and CIFAR-10 datasets.
Today's state-of-the-art image classifiers fail to correctly classify carefully manipulated adversarial images. In this work, we develop a new, localized adversarial attack that generates adversarial examples by imperceptibly altering the backgrounds of normal images. We first use this attack to highlight the unnecessary sensitivity of neural networks to changes in the background of an image, then use it as part of a new training technique: localized adversarial training. By including locally adversarial images in the training set, we are able to create a classifier that suffers less loss than a non-adversarially trained counterpart model on both natural and adversarial inputs. The evaluation of our localized adversarial training algorithm on MNIST and CIFAR-10 datasets shows decreased accuracy loss on natural images, and increased robustness against adversarial inputs.