CVCRDec 7, 2021

Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

arXiv:2112.03492v238 citations
AI Analysis

This addresses the challenge of efficiently attacking ViTs in black-box settings, which is important for evaluating adversarial robustness, but it is incremental as it builds on existing decision-based attacks.

The paper tackles the problem of decision-based black-box attacks on Vision Transformers (ViTs) by proposing Patch-wise Adversarial Removal (PAR), which achieves a much lower noise magnitude with the same number of queries compared to existing methods, as demonstrated in experiments on three datasets.

Vision transformers (ViTs) have demonstrated impressive performance and stronger adversarial robustness compared to Convolutional Neural Networks (CNNs). On the one hand, ViTs' focus on global interaction between individual patches reduces the local noise sensitivity of images. On the other hand, the neglect of noise sensitivity differences between image regions by existing decision-based attacks further compromises the efficiency of noise compression, especially for ViTs. Therefore, validating the black-box adversarial robustness of ViTs when the target model can only be queried still remains a challenging problem. In this paper, we theoretically analyze the limitations of existing decision-based attacks from the perspective of noise sensitivity difference between regions of the image, and propose a new decision-based black-box attack against ViTs, termed Patch-wise Adversarial Removal (PAR). PAR divides images into patches through a coarse-to-fine search process and compresses the noise on each patch separately. PAR records the noise magnitude and noise sensitivity of each patch and selects the patch with the highest query value for noise compression. In addition, PAR can be used as a noise initialization method for other decision-based attacks to improve the noise compression efficiency on both ViTs and CNNs without introducing additional calculations. Extensive experiments on three datasets demonstrate that PAR achieves a much lower noise magnitude with the same number of queries.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes