Improving Adversarial Robustness by Encouraging Discriminative Features
This addresses security and reliability issues in pattern recognition applications, but it is incremental as it builds on existing methods for adversarial robustness.
The paper tackled the problem of deep neural networks being vulnerable to adversarial examples by encouraging them to learn more discriminative features using a center loss combined with softmax cross-entropy loss, resulting in improved robustness on MNIST, CIFAR-10, and CIFAR-100 datasets.
Deep neural networks (DNNs) have achieved state-of-the-art results in various pattern recognition tasks. However, they perform poorly on out-of-distribution adversarial examples i.e. inputs that are specifically crafted by an adversary to cause DNNs to misbehave, questioning the security and reliability of applications. In this paper, we encourage DNN classifiers to learn more discriminative features by imposing a center loss in addition to the regular softmax cross-entropy loss. Intuitively, the center loss encourages DNNs to simultaneously learns a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers. We hypothesize that minimizing distances between intra-class features and maximizing the distances between inter-class features at the same time would improve a classifier's robustness to adversarial examples. Our results on state-of-the-art architectures on MNIST, CIFAR-10, and CIFAR-100 confirmed that intuition and highlight the importance of discriminative features.