CV CR LGFeb 19, 2024

Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training

Leo Hyun Park, Jaeuk Kim, Myung Gyo Oh, Jaewoo Park, Taekyoung Kwon

arXiv:2402.12187v19.612 citationsh-index: 6AISec@CCS

Originality Incremental advance

AI Analysis

It addresses a critical security challenge for deploying reliable deep learning models, though it appears incremental as it builds on existing adversarial training methods.

This paper tackles the problem of balancing robustness against adversarial attacks and accuracy on clean data in deep learning models by proposing Adversarial Feature Alignment (AFA), a novel adversarial training method that reduces the drop in clean accuracy to 1.86% on CIFAR10 and 8.91% on CIFAR100 while improving robust accuracy.

Deep learning models continue to advance in accuracy, yet they remain vulnerable to adversarial attacks, which often lead to the misclassification of adversarial examples. Adversarial training is used to mitigate this problem by increasing robustness against these attacks. However, this approach typically reduces a model's standard accuracy on clean, non-adversarial samples. The necessity for deep learning models to balance both robustness and accuracy for security is obvious, but achieving this balance remains challenging, and the underlying reasons are yet to be clarified. This paper proposes a novel adversarial training method called Adversarial Feature Alignment (AFA), to address these problems. Our research unveils an intriguing insight: misalignment within the feature space often leads to misclassification, regardless of whether the samples are benign or adversarial. AFA mitigates this risk by employing a novel optimization algorithm based on contrastive learning to alleviate potential feature misalignment. Through our evaluations, we demonstrate the superior performance of AFA. The baseline AFA delivers higher robust accuracy than previous adversarial contrastive learning methods while minimizing the drop in clean accuracy to 1.86% and 8.91% on CIFAR10 and CIFAR100, respectively, in comparison to cross-entropy. We also show that joint optimization of AFA and TRADES, accompanied by data augmentation using a recent diffusion model, achieves state-of-the-art accuracy and robustness.

View on arXiv PDF

Similar