Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization
This work addresses the vulnerability of neural networks to adversarial attacks, which is a critical security issue in machine learning applications, though it builds on existing adversarial training methods.
The paper tackles the problem of improving local stability in neural networks by proposing a robust optimization framework that uses adversarial training to enhance robustness against adversarial examples, resulting in increased robustness and improved accuracy on test data.
We propose a general framework for increasing local stability of Artificial Neural Nets (ANNs) using Robust Optimization (RO). We achieve this through an alternating minimization-maximization procedure, in which the loss of the network is minimized over perturbed examples that are generated at each parameter update. We show that adversarial training of ANNs is in fact robustification of the network optimization, and that our proposed framework generalizes previous approaches for increasing local stability of ANNs. Experimental results reveal that our approach increases the robustness of the network to existing adversarial examples, while making it harder to generate new ones. Furthermore, our algorithm improves the accuracy of the network also on the original test data.