Patch-wise Attack for Fooling Deep Neural Network
This addresses the challenge of fooling deep neural networks in black-box settings, offering incremental improvements in adversarial attack transferability.
The paper tackles the problem of creating adversarial examples with strong transferability across models by proposing a patch-wise iterative algorithm, which improves the success rate by 9.2% for defense models and 3.7% for normally trained models compared to state-of-the-art attacks.
By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models. Features of a pixel extracted by deep neural networks (DNNs) are influenced by its surrounding regions, and different DNNs generally focus on different discriminative regions in recognition. Motivated by this, we propose a patch-wise iterative algorithm -- a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise. In this way, without sacrificing the performance of white-box attack, our adversarial examples can have strong transferability. Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $ε$-constraint is properly assigned to its surrounding regions by a project kernel. Our method can be generally integrated to any gradient-based attack methods. Compared with the current state-of-the-art attacks, we significantly improve the success rate by 9.2\% for defense models and 3.7\% for normally trained models on average. Our code is available at \url{https://github.com/qilong-zhang/Patch-wise-iterative-attack}