LG CVMar 26, 2024

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

arXiv:2403.17520v19.215 citationsh-index: 27Has CodeCVPR

Originality Incremental advance

AI Analysis

This work addresses a key challenge in adversarial machine learning by mitigating the accuracy-robustness trade-off, offering a practical solution for enhancing model security without significant computational cost.

The paper tackles the trade-off between adversarial robustness and standard accuracy in adversarial training by proposing a regularization framework based on the Fisher-Rao norm, which reduces the generalization gap with minimal computational overhead and improves performance across multiple adversarial training algorithms.

Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First, We leverage the Fisher-Rao norm, a geometrically invariant metric for model complexity, to establish the non-trivial bounds of the Cross-Entropy Loss-based Rademacher complexity for a ReLU-activated Multi-Layer Perceptron. Then we generalize a complexity-related variable, which is sensitive to the changes in model width and the trade-off factors in adversarial training. Moreover, intensive empirical evidence validates that this variable highly correlates with the generalization gap of Cross-Entropy loss between adversarial-trained and standard-trained models, especially during the initial and final phases of the training process. Building upon this observation, we propose a novel regularization framework, called Logit-Oriented Adversarial Training (LOAT), which can mitigate the trade-off between robustness and accuracy while imposing only a negligible increase in computational overhead. Our extensive experiments demonstrate that the proposed regularization strategy can boost the performance of the prevalent adversarial training algorithms, including PGD-AT, TRADES, TRADES (LSE), MART, and DM-AT, across various network architectures. Our code will be available at https://github.com/TrustAI/LOAT.

View on arXiv PDF Code

Similar