LGSep 28, 2022

Exploring the Relationship between Architecture and Adversarially Robust Generalization

Aishan Liu, Shiyu Tang, Siyuan Liang, Ruihao Gong, Boxi Wu, Xianglong Liu, Dacheng Tao

arXiv:2209.14105v222.927 citationsh-index: 63

Originality Incremental advance

AI Analysis

It addresses the problem of improving adversarial robustness for deep neural networks, particularly for researchers and practitioners in computer vision, by providing insights into architectural choices, though it is incremental as it builds on existing adversarial training methods.

This paper investigates how architectural design affects adversarially robust generalization, finding that Vision Transformers like PVT and CoAtNet achieve better generalization on multiple adversarial attacks compared to CNNs, which tend to overfit, based on experiments on ImageNette and CIFAR-10 datasets.

Adversarial training has been demonstrated to be one of the most effective remedies for defending adversarial examples, yet it often suffers from the huge robustness generalization gap on unseen testing adversaries, deemed as the adversarially robust generalization problem. Despite the preliminary understandings devoted to adversarially robust generalization, little is known from the architectural perspective. To bridge the gap, this paper for the first time systematically investigated the relationship between adversarially robust generalization and architectural design. Inparticular, we comprehensively evaluated 20 most representative adversarially trained architectures on ImageNette and CIFAR-10 datasets towards multiple `p-norm adversarial attacks. Based on the extensive experiments, we found that, under aligned settings, Vision Transformers (e.g., PVT, CoAtNet) often yield better adversarially robust generalization while CNNs tend to overfit on specific attacks and fail to generalize on multiple adversaries. To better understand the nature behind it, we conduct theoretical analysis via the lens of Rademacher complexity. We revealed the fact that the higher weight sparsity contributes significantly towards the better adversarially robust generalization of Transformers, which can be often achieved by the specially-designed attention blocks. We hope our paper could help to better understand the mechanism for designing robust DNNs. Our model weights can be found at http://robust.art.

View on arXiv PDF

Similar