Revisiting and Advancing Adversarial Training Through A Simple Baseline
This work addresses adversarial robustness in machine learning models, offering an incremental improvement by simplifying and optimizing existing training components.
The paper tackles the problem of improving adversarial training for defense against attacks by identifying key factors like loss function and data augmentation, and introduces SimpleAT, a baseline method that achieves competitive robustness on datasets like CIFAR-10/100 and Tiny-ImageNet, reducing robust overfitting and variance in predictions.
In this paper, we delve into the essential components of adversarial training which is a pioneering defense technique against adversarial attacks. We indicate that some factors such as the loss function, learning rate scheduler, and data augmentation, which are independent of the model architecture, will influence adversarial robustness and generalization. When these factors are controlled for, we introduce a simple baseline approach, termed SimpleAT, that performs competitively with recent methods and mitigates robust overfitting. We conduct extensive experiments on CIFAR-10/100 and Tiny-ImageNet, which validate the robustness of SimpleAT against state-of-the-art adversarial attackers such as AutoAttack. Our results also demonstrate that SimpleAT exhibits good performance in the presence of various image corruptions, such as those found in the CIFAR-10-C. In addition, we empirically show that SimpleAT is capable of reducing the variance in model predictions, which is considered the primary contributor to robust overfitting. Our results also reveal the connections between SimpleAT and many advanced state-of-the-art adversarial defense methods.