LG AI CVAug 25, 2025

Robustness Feature Adapter for Efficient Adversarial Training

arXiv:2508.17680v11 citationsh-index: 3ECAI

Originality Incremental advance

AI Analysis

This work addresses efficiency and reliability problems in adversarial training for foundation models, representing an incremental improvement.

The paper tackles the computational overhead and robust overfitting issues in adversarial training for large models by proposing a feature-space adapter-based approach, which improves efficiency and generalizes robustness to unseen attacks.

Adversarial training (AT) with projected gradient descent is the most popular method to improve model robustness under adversarial attacks. However, computational overheads become prohibitively large when AT is applied to large backbone models. AT is also known to have the issue of robust overfitting. This paper contributes to solving both problems simultaneously towards building more trustworthy foundation models. In particular, we propose a new adapter-based approach for efficient AT directly in the feature space. We show that the proposed adapter-based approach can improve the inner-loop convergence quality by eliminating robust overfitting. As a result, it significantly increases computational efficiency and improves model accuracy by generalizing adversarial robustness to unseen attacks. We demonstrate the effectiveness of the new adapter-based approach in different backbone architectures and in AT at scale.

View on arXiv PDF

Similar