LGAIApr 11, 2025

On Transfer-based Universal Attacks in Pure Black-box Setting

arXiv:2504.08866v1h-index: 32
Originality Incremental advance
AI Analysis

This work addresses the need for more transparent and realistic evaluations of adversarial attacks in machine learning security, though it is incremental in refining existing paradigms.

The paper tackles the problem of transferable black-box adversarial attacks on deep visual models by identifying that existing methods rely on priors that violate black-box assumptions, and it proposes a framework for prior-free analysis, showing that priors cause overestimation in transferability scores.

Despite their impressive performance, deep visual models are susceptible to transferable black-box adversarial attacks. Principally, these attacks craft perturbations in a target model-agnostic manner. However, surprisingly, we find that existing methods in this domain inadvertently take help from various priors that violate the black-box assumption such as the availability of the dataset used to train the target model, and the knowledge of the number of classes in the target model. Consequently, the literature fails to articulate the true potency of transferable black-box attacks. We provide an empirical study of these biases and propose a framework that aids in a prior-free transparent study of this paradigm. Using our framework, we analyze the role of prior knowledge of the target model data and number of classes in attack performance. We also provide several interesting insights based on our analysis, and demonstrate that priors cause overestimation in transferability scores. Finally, we extend our framework to query-based attacks. This extension inspires a novel image-blending technique to prepare data for effective surrogate model training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes