Bridged Adversarial Training
This work addresses adversarial robustness in deep neural networks, offering an incremental improvement by bridging gaps between clean and adversarial examples to enhance model stability against large perturbations.
The paper tackles the issue that adversarially trained models can have varying characteristics like margin and smoothness despite similar robustness, proposing bridged adversarial training to mitigate the negative effect of smoothness regularizers on margin and improve robustness for large perturbations, with theoretical and empirical evidence showing stable and better performance.
Adversarial robustness is considered as a required property of deep neural networks. In this study, we discover that adversarially trained models might have significantly different characteristics in terms of margin and smoothness, even they show similar robustness. Inspired by the observation, we investigate the effect of different regularizers and discover the negative effect of the smoothness regularizer on maximizing the margin. Based on the analyses, we propose a new method called bridged adversarial training that mitigates the negative effect by bridging the gap between clean and adversarial examples. We provide theoretical and empirical evidence that the proposed method provides stable and better robustness, especially for large perturbations.