Adv-4-Adv: Thwarting Changing Adversarial Perturbations via Adversarial Domain Adaptation
This addresses the critical issue of adversarial robustness in deep learning models, providing a solution for improving generalization against diverse attacks, though it appears incremental as it builds on existing adversarial domain adaptation techniques.
The paper tackles the problem of adversarial training's inability to generalize to unseen adversarial perturbations by proposing Adv-4-Adv, a method that treats different attacks as domains and uses adversarial domain adaptation to learn robust representations. The result shows that models trained with Adv-4-Adv on simple attacks like FGSM generalize to advanced attacks like PGD, exceeding state-of-the-art performance on datasets such as Fashion-MNIST, SVHN, CIFAR-10, and CIFAR-100.
Whereas adversarial training can be useful against specific adversarial perturbations, they have also proven ineffective in generalizing towards attacks deviating from those used for training. However, we observe that this ineffectiveness is intrinsically connected to domain adaptability, another crucial issue in deep learning for which adversarial domain adaptation appears to be a promising solution. Consequently, we proposed Adv-4-Adv as a novel adversarial training method that aims to retain robustness against unseen adversarial perturbations. Essentially, Adv-4-Adv treats attacks incurring different perturbations as distinct domains, and by leveraging the power of adversarial domain adaptation, it aims to remove the domain/attack-specific features. This forces a trained model to learn a robust domain-invariant representation, which in turn enhances its generalization ability. Extensive evaluations on Fashion-MNIST, SVHN, CIFAR-10, and CIFAR-100 demonstrate that a model trained by Adv-4-Adv based on samples crafted by simple attacks (e.g., FGSM) can be generalized to more advanced attacks (e.g., PGD), and the performance exceeds state-of-the-art proposals on these datasets.