Improving Black-box Adversarial Attacks with a Transfer-based Prior
This work addresses the challenge of generating effective adversarial perturbations without access to target model gradients, which is crucial for security testing in machine learning systems, though it is incremental in improving existing attack methods.
The paper tackles the problem of low success rates and poor query efficiency in black-box adversarial attacks by proposing a prior-guided random gradient-free method that integrates a transfer-based prior with query information, achieving higher success rates with significantly fewer queries.
We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.