Yet Another Intermediate-Level Attack
This work addresses the challenge of creating more effective adversarial attacks for security testing, but it appears incremental as it builds on existing baseline attacks.
The paper tackles the problem of improving black-box transferability of adversarial examples in deep neural networks by proposing a method that uses intermediate-level discrepancies to predict adversarial loss, achieving state-of-the-art results on CIFAR-100 and ImageNet datasets.
The transferability of adversarial examples across deep neural network (DNN) models is the crux of a spectrum of black-box attacks. In this paper, we propose a novel method to enhance the black-box transferability of baseline adversarial examples. By establishing a linear mapping of the intermediate-level discrepancies (between a set of adversarial inputs and their benign counterparts) for predicting the evoked adversarial loss, we aim to take full advantage of the optimization procedure of multi-step baseline attacks. We conducted extensive experiments to verify the effectiveness of our method on CIFAR-100 and ImageNet. Experimental results demonstrate that it outperforms previous state-of-the-arts considerably. Our code is at https://github.com/qizhangli/ila-plus-plus.