LGAIOct 21, 2024

Conflict-Aware Adversarial Training

arXiv:2410.16579v12 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in adversarial training for deep neural networks, offering an incremental improvement over existing methods.

The paper tackles the problem of balancing standard performance and adversarial robustness in adversarial training by identifying gradient conflicts between standard and adversarial losses, and proposes a conflict-aware factor to improve this trade-off. Experimental results show that CA-AT consistently offers a superior trade-off in various settings.

Adversarial training is the most effective method to obtain adversarial robustness for deep neural networks by directly involving adversarial samples in the training procedure. To obtain an accurate and robust model, the weighted-average method is applied to optimize standard loss and adversarial loss simultaneously. In this paper, we argue that the weighted-average method does not provide the best tradeoff for the standard performance and adversarial robustness. We argue that the failure of the weighted-average method is due to the conflict between the gradients derived from standard and adversarial loss, and further demonstrate such a conflict increases with attack budget theoretically and practically. To alleviate this problem, we propose a new trade-off paradigm for adversarial training with a conflict-aware factor for the convex combination of standard and adversarial loss, named \textbf{Conflict-Aware Adversarial Training~(CA-AT)}. Comprehensive experimental results show that CA-AT consistently offers a superior trade-off between standard performance and adversarial robustness under the settings of adversarial training from scratch and parameter-efficient finetuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes