CVSep 16, 2022

Enhance the Visual Representation via Discrete Adversarial Training

Xiaofeng Mao, Yuefeng Chen, Ranjie Duan, Yao Zhu, Gege Qi, Shaokai Ye, Xiaodan Li, Rong Zhang, Hui Xue

arXiv:2209.07735v115.644 citationsh-index: 25

Originality Highly original

AI Analysis

This addresses the limitation of adversarial training for industrial-scale computer vision applications by borrowing insights from NLP to improve robustness and generalization.

The paper tackles the problem that adversarial training (AT) harms standard performance in computer vision, unlike in NLP where it benefits generalization, by proposing Discrete Adversarial Training (DAT), which converts images to discrete inputs and achieves state-of-the-art results, such as 31.40 mCE on ImageNet-C and 32.77% top-1 accuracy on Stylized-ImageNet.

Adversarial Training (AT), which is commonly accepted as one of the most effective approaches defending against adversarial examples, can largely harm the standard performance, thus has limited usefulness on industrial-scale production and applications. Surprisingly, this phenomenon is totally opposite in Natural Language Processing (NLP) task, where AT can even benefit for generalization. We notice the merit of AT in NLP tasks could derive from the discrete and symbolic input space. For borrowing the advantage from NLP-style AT, we propose Discrete Adversarial Training (DAT). DAT leverages VQGAN to reform the image data to discrete text-like inputs, i.e. visual words. Then it minimizes the maximal risk on such discrete images with symbolic adversarial perturbations. We further give an explanation from the perspective of distribution to demonstrate the effectiveness of DAT. As a plug-and-play technique for enhancing the visual representation, DAT achieves significant improvement on multiple tasks including image classification, object detection and self-supervised learning. Especially, the model pre-trained with Masked Auto-Encoding (MAE) and fine-tuned by our DAT without extra data can get 31.40 mCE on ImageNet-C and 32.77% top-1 accuracy on Stylized-ImageNet, building the new state-of-the-art. The code will be available at https://github.com/alibaba/easyrobust.

View on arXiv PDF

Similar