Meta Gradient Adversarial Attack
This addresses the challenge of adversarial attack transferability for machine learning security, representing an incremental improvement over existing methods.
The paper tackles the problem of improving transferability of adversarial attacks to unseen black-box models by proposing Meta Gradient Adversarial Attack (MGAA), a plug-and-play architecture that integrates with existing gradient-based methods and outperforms state-of-the-art methods on CIFAR10 and ImageNet datasets.
In recent years, research on adversarial attacks has become a hot spot. Although current literature on the transfer-based adversarial attack has achieved promising results for improving the transferability to unseen black-box models, it still leaves a long way to go. Inspired by the idea of meta-learning, this paper proposes a novel architecture called Meta Gradient Adversarial Attack (MGAA), which is plug-and-play and can be integrated with any existing gradient-based attack method for improving the cross-model transferability. Specifically, we randomly sample multiple models from a model zoo to compose different tasks and iteratively simulate a white-box attack and a black-box attack in each task. By narrowing the gap between the gradient directions in white-box and black-box attacks, the transferability of adversarial examples on the black-box setting can be improved. Extensive experiments on the CIFAR10 and ImageNet datasets show that our architecture outperforms the state-of-the-art methods for both black-box and white-box attack settings.