Adversarial Attack via Dual-Stage Network Erosion
This work addresses the challenge of adversarial robustness for deep learning models, offering incremental improvements in transferability for black-box attacks.
The paper tackles the problem of generating transferable adversarial examples in black-box settings by proposing Dual-Stage Network Erosion (DSNE), which applies dual-stage feature-level perturbations and longitudinal ensemble to improve transferability, achieving significant gains, especially for residual networks by biasing residual block information to skip connections.
Deep neural networks are vulnerable to adversarial examples, which can fool deep models by adding subtle perturbations. Although existing attacks have achieved promising results, it still leaves a long way to go for generating transferable adversarial examples under the black-box setting. To this end, this paper proposes to improve the transferability of adversarial examples, and applies dual-stage feature-level perturbations to an existing model to implicitly create a set of diverse models. Then these models are fused by the longitudinal ensemble during the iterations. The proposed method is termed Dual-Stage Network Erosion (DSNE). We conduct comprehensive experiments both on non-residual and residual networks, and obtain more transferable adversarial examples with the computational cost similar to the state-of-the-art method. In particular, for the residual networks, the transferability of the adversarial examples can be significantly improved by biasing the residual block information to the skip connections. Our work provides new insights into the architectural vulnerability of neural networks and presents new challenges to the robustness of neural networks.