Wasserstein distributional adversarial training for deep neural networks
This work addresses adversarial robustness for deep neural networks, representing an incremental extension of existing adversarial training methods to distributional threats.
The paper tackles the problem of adversarial attacks on deep neural networks by extending TRADES adversarial training to handle distributional attacks using Wasserstein distributionally robust optimization. Experimental results show the method enhances Wasserstein distributional robustness while maintaining pointwise robustness, with improvements even on models pre-trained with 20-100M synthetic images using only 50k training images.
Design of adversarial attacks for deep neural networks, as well as methods of adversarial training against them, are subject of intense research. In this paper, we propose methods to train against distributional attack threats, extending the TRADES method used for pointwise attacks. Our approach leverages recent contributions and relies on sensitivity analysis for Wasserstein distributionally robust optimization problems. We introduce an efficient fine-tuning method which can be deployed on a previously trained model. We test our methods on a range of pre-trained models on RobustBench. These experimental results demonstrate the additional training enhances Wasserstein distributional robustness, while maintaining original levels of pointwise robustness, even for already very successful networks. The improvements are less marked for models pre-trained using huge synthetic datasets of 20-100M images. However, remarkably, sometimes our methods are still able to improve their performance even when trained using only the original training dataset (50k images).