LGCVNov 25, 2024

Scaling Laws for Black box Adversarial Attacks

arXiv:2411.16782v39 citationsh-index: 41
Originality Incremental advance
AI Analysis

This work addresses the threat of black-box adversarial attacks in commercial settings, showing incremental improvements through model scaling.

The paper investigates whether increasing the number of surrogate models in ensemble-based adversarial attacks improves transferability to black-box models, finding clear scaling laws that enhance attack success rates, achieving over 90% on proprietary models like GPT-4o.

Adversarial examples usually exhibit good cross-model transferability, enabling attacks on black-box models with limited information about their architectures and parameters, which are highly threatening in commercial black-box scenarios. Model ensembling is an effective strategy to improve the transferability of adversarial examples by attacking multiple surrogate models. However, since prior studies usually adopt few models in the ensemble, there remains an open question of whether scaling the number of models can further improve black-box attacks. Inspired by the scaling law of large foundation models, we investigate the scaling laws of black-box adversarial attacks in this work. Through theoretical analysis and empirical evaluations, we conclude with clear scaling laws that using more surrogate models enhances adversarial transferability. Comprehensive experiments verify the claims on standard image classifiers, diverse defended models and multimodal large language models using various adversarial attack methods. Specifically, by scaling law, we achieve 90%+ transfer attack success rate on even proprietary models like GPT-4o. Further visualization indicates that there is also a scaling law on the interpretability and semantics of adversarial perturbations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes