THAT: Two Head Adversarial Training for Improving Robustness at Scale
This addresses the problem of adversarial robustness for deep learning models in real-world, many-class scenarios, representing a significant advancement over prior work focused on smaller datasets.
The paper tackles the challenge of scaling adversarial training to large-scale datasets like ImageNet by proposing Two Head Adversarial Training (THAT), which achieves state-of-the-art robust accuracy while maintaining high natural accuracy.
Many variants of adversarial training have been proposed, with most research focusing on problems with relatively few classes. In this paper, we propose Two Head Adversarial Training (THAT), a two-stream adversarial learning network that is designed to handle the large-scale many-class ImageNet dataset. The proposed method trains a network with two heads and two loss functions; one to minimize feature-space domain shift between natural and adversarial images, and one to promote high classification accuracy. This combination delivers a hardened network that achieves state of the art robust accuracy while maintaining high natural accuracy on ImageNet. Through extensive experiments, we demonstrate that the proposed framework outperforms alternative methods under both standard and "free" adversarial training settings.