Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data
This work addresses the issue of false robustness assessments in machine learning models by enhancing attack efficiency, though it is incremental as it builds on existing black-box methods.
The paper tackles the problem of query inefficiency in black-box adversarial attacks by proposing the spanning attack, which constrains perturbations in a low-dimensional subspace using unlabeled data, resulting in significant improvements in query efficiency across various attack types.
Adversarial black-box attacks aim to craft adversarial perturbations by querying input-output pairs of machine learning models. They are widely used to evaluate the robustness of pre-trained models. However, black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space, and therefore incur a false sense of model robustness. In this paper, we relax the conditions of the black-box threat model, and propose a novel technique called the spanning attack. By constraining adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset, the spanning attack significantly improves the query efficiency of a wide variety of existing black-box attacks. Extensive experiments show that the proposed method works favorably in both soft-label and hard-label black-box attacks. Our code is available at https://github.com/wangwllu/spanning_attack.