Regional Adversarial Training for Better Robust Generalization
This work addresses the challenge of improving adversarial robustness for machine learning models, particularly in security-critical applications, though it is incremental as it builds on standard adversarial training.
The paper tackles the problem of weak adversarial robust generalization in existing adversarial training methods by proposing Regional Adversarial Training (RAT), which improves robust generalization on test data by considering diverse perturbed points and using a distance-aware label smoothing mechanism.
Adversarial training (AT) has been demonstrated as one of the most promising defense methods against various adversarial attacks. To our knowledge, existing AT-based methods usually train with the locally most adversarial perturbed points and treat all the perturbed points equally, which may lead to considerably weaker adversarial robust generalization on test data. In this work, we introduce a new adversarial training framework that considers the diversity as well as characteristics of the perturbed points in the vicinity of benign samples. To realize the framework, we propose a Regional Adversarial Training (RAT) defense method that first utilizes the attack path generated by the typical iterative attack method of projected gradient descent (PGD), and constructs an adversarial region based on the attack path. Then, RAT samples diverse perturbed training points efficiently inside this region, and utilizes a distance-aware label smoothing mechanism to capture our intuition that perturbed points at different locations should have different impact on the model performance. Extensive experiments on several benchmark datasets show that RAT consistently makes significant improvement on standard adversarial training (SAT), and exhibits better robust generalization.