LGOct 16, 2024

New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class

Yanyun Wang, Li Liu, Zi Liang, Yi R., Fung, Qingqing Ye, Haibo Hu

arXiv:2410.12671v27.92 citationsh-index: 18Has Code

Originality Highly original

AI Analysis

This addresses a foundational problem in machine learning by potentially enhancing the security and reliability of deep neural networks against adversarial attacks, though it appears incremental as it builds on existing adversarial training methods.

The paper tackles the inherent accuracy-robustness trade-off in adversarial training by proposing a new paradigm that introduces dummy classes to handle hard adversarial samples, resulting in concurrent improvements in both accuracy and robustness over state-of-the-art benchmarks.

Adversarial Training (AT) is one of the most effective methods to enhance the robustness of Deep Neural Networks (DNNs). However, existing AT methods suffer from an inherent accuracy-robustness trade-off. Previous works have studied this issue under the current AT paradigm, but still face over 10% accuracy reduction without significant robustness improvement over simple baselines such as PGD-AT. This inherent trade-off raises a question: Whether the current AT paradigm, which assumes to learn corresponding benign and adversarial samples as the same class, inappropriately mixes clean and robust objectives that may be essentially inconsistent. In fact, our empirical results show that up to 40% of CIFAR-10 adversarial samples always fail to satisfy such an assumption across various AT methods and robust models, explicitly indicating the room for improvement of the current AT paradigm. To relax from this overstrict assumption and the tension between clean and robust learning, in this work, we propose a new AT paradigm by introducing an additional dummy class for each original class, aiming to accommodate hard adversarial samples with shifted distribution after perturbation. The robustness w.r.t. these adversarial samples can be achieved by runtime recovery from the predicted dummy classes to the corresponding original ones, without conflicting with the clean objective on accuracy of benign samples. Finally, based on our new paradigm, we propose a novel DUmmy Classes-based Adversarial Training (DUCAT) method that concurrently improves accuracy and robustness in a plug-and-play manner only relevant to logits, loss, and a proposed two-hot soft label-based supervised signal. Our method outperforms state-of-the-art (SOTA) benchmarks, effectively releasing the current trade-off. The code is available at https://github.com/FlaAI/DUCAT.

View on arXiv PDF Code

Similar