Improving Adversarial Robustness by Putting More Regularizations on Less Robust Samples
This addresses the critical issue of adversarial attacks for AI security, but it is incremental as it builds on existing adversarial training methods.
The paper tackles the problem of adversarial robustness in deep neural networks by proposing a new adversarial training algorithm that applies more regularization to vulnerable samples, achieving state-of-the-art performance with improvements in both generalization and robustness.
Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to apply more regularization to data vulnerable to adversarial attacks than other existing regularization algorithms do. Theoretically, we show that our algorithm can be understood as an algorithm of minimizing the regularized empirical risk motivated from a newly derived upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on examples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance.