CV CR LGOct 8, 2021

Adversarial Token Attacks on Vision Transformers

Ameya Joshi, Gauri Jagatap, Chinmay Hegde

arXiv:2110.04337v18.023 citations

Originality Incremental advance

AI Analysis

This addresses robustness issues in vision transformers for computer vision applications, but it is incremental as it builds on existing adversarial attack research.

The paper investigates differences between vision transformers and convolutional networks by designing adversarial token attacks, finding that transformers are more sensitive, with ResNets outperforming transformers by up to ~30% in robust accuracy for single token attacks.

Vision transformers rely on a patch token based self attention mechanism, in contrast to convolutional networks. We investigate fundamental differences between these two families of models, by designing a block sparsity based adversarial token attack. We probe and analyze transformer as well as convolutional models with token attacks of varying patch sizes. We infer that transformer models are more sensitive to token attacks than convolutional models, with ResNets outperforming Transformer models by up to $\sim30\%$ in robust accuracy for single token attacks.

View on arXiv PDF

Similar