Randomized Adversarial Training via Taylor Expansion
This work addresses the problem of balancing robustness against adversarial examples and accuracy on clean data for deep learning practitioners, representing an incremental improvement over existing adversarial training methods.
The paper tackles the robustness-accuracy trade-off in adversarial training for deep neural networks by introducing random noise into deterministic weights, which flattens the loss landscape and finds flat minima. The method enhances state-of-the-art adversarial training methods, improving both robustness and clean accuracy as demonstrated with PGD, CW, and Auto Attacks.
In recent years, there has been an explosion of research into developing more robust deep neural networks against adversarial examples. Adversarial training appears as one of the most successful methods. To deal with both the robustness against adversarial examples and the accuracy over clean examples, many works develop enhanced adversarial training methods to achieve various trade-offs between them. Leveraging over the studies that smoothed update on weights during training may help find flat minima and improve generalization, we suggest reconciling the robustness-accuracy trade-off from another perspective, i.e., by adding random noise into deterministic weights. The randomized weights enable our design of a novel adversarial training method via Taylor expansion of a small Gaussian noise, and we show that the new adversarial training method can flatten loss landscape and find flat minima. With PGD, CW, and Auto Attacks, an extensive set of experiments demonstrate that our method enhances the state-of-the-art adversarial training methods, boosting both robustness and clean accuracy. The code is available at https://github.com/Alexkael/Randomized-Adversarial-Training.