Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification
This work addresses a key limitation in adversarial training for neural network classification, offering a moderate improvement for practitioners needing robust models without sacrificing clean data performance.
The paper tackles the problem of adversarial training reducing accuracy on clean inputs by proposing an approach that protects neural networks from adversarial samples while improving accuracy on clean examples, demonstrating effectiveness across various networks and datasets.
It has been demonstrated that deep neural networks are prone to noisy examples particular adversarial samples during inference process. The gap between robust deep learning systems in real world applications and vulnerable neural networks is still large. Current adversarial training strategies improve the robustness against adversarial samples. However, these methods lead to accuracy reduction when the input examples are clean thus hinders the practicability. In this paper, we investigate an approach that protects the neural network classification from the adversarial samples and improves its accuracy when the input examples are clean. We demonstrate the versatility and effectiveness of our proposed approach on a variety of different networks and datasets.