Deep Repulsive Prototypes for Adversarial Robustness
This addresses the challenge of adversarial attacks in machine learning by offering a more efficient alternative to adversarial training, though it is incremental as it builds on existing robustness methods.
The paper tackles the problem of adversarial robustness by training models with deep repulsive prototypes to achieve large class separation in the output space, resulting in competitive robustness without adversarial training, such as over 50% robustness on CIFAR-10 with 92% natural accuracy.
While many defences against adversarial examples have been proposed, finding robust machine learning models is still an open problem. The most compelling defence to date is adversarial training and consists of complementing the training data set with adversarial examples. Yet adversarial training severely impacts training time and depends on finding representative adversarial samples. In this paper we propose to train models on output spaces with large class separation in order to gain robustness without adversarial training. We introduce a method to partition the output space into class prototypes with large separation and train models to preserve it. Experimental results shows that models trained with these prototypes -- which we call deep repulsive prototypes -- gain robustness competitive with adversarial training, while also preserving more accuracy on natural samples. Moreover, the models are more resilient to large perturbation sizes. For example, we obtained over 50% robustness for CIFAR-10, with 92% accuracy on natural samples and over 20% robustness for CIFAR-100, with 71% accuracy on natural samples without adversarial training. For both data sets, the models preserved robustness against large perturbations better than adversarially trained models.