CV LGJan 2, 2024

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Xixu Hu, Runkai Zheng, Jindong Wang, Cheuk Hang Leung, Qi Wu, Xing Xie

arXiv:2402.03317v28.77 citationsh-index: 14Has CodeECCV

Originality Highly original

AI Analysis

This addresses the security problem for users of Vision Transformers in computer vision applications, offering a theoretically grounded defense against adversarial attacks.

The paper tackles the vulnerability of Vision Transformers to adversarial attacks by introducing SpecFormer, which uses Maximum Singular Value Penalization to enhance robustness, achieving state-of-the-art results on CIFAR and ImageNet datasets.

Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs' attention layers, we enhance the model's robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn.

View on arXiv PDF Code

Similar