Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior
This work addresses the vulnerability of deep learning models to adversarial attacks in black-box settings, offering improved efficiency and success rates, though it is incremental as it builds on existing gradient estimation methods.
The paper tackles the problem of low success rates and poor query efficiency in black-box adversarial attacks by proposing two prior-guided random gradient-free algorithms that integrate transfer-based priors with model queries, achieving higher success rates with significantly fewer queries compared to state-of-the-art methods.
Adversarial attacks have been extensively studied in recent years since they can identify the vulnerability of deep learning models before deployed. In this paper, we consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model. Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries. However, the existing methods inevitably suffer from low attack success rates or poor query efficiency since it is difficult to estimate the gradient in a high-dimensional input space with limited information. To address these problems and improve black-box attacks, we propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging, respectively. Our methods can take the advantage of a transfer-based prior given by the gradient of a surrogate model and the query information simultaneously. Through theoretical analyses, the transfer-based prior is appropriately integrated with model queries by an optimal coefficient in each method. Extensive experiments demonstrate that, in comparison with the alternative state-of-the-arts, both of our methods require much fewer queries to attack black-box models with higher success rates.