Transferable Adversarial Examples with Bayes Approach
This work addresses a critical issue in trustworthy AI by improving the effectiveness of adversarial attacks across different models, which is incremental but offers strong specific gains.
The paper tackles the problem of poor cross-model transferability in black-box adversarial attacks by introducing BayAtk, a method that uses a Bayesian approach to design transferability-promoting priors and an adaptive weighting strategy, resulting in significantly more transferable adversarial examples against both undefended and defended models compared to state-of-the-art attacks.
The vulnerability of deep neural networks (DNNs) to black-box adversarial attacks is one of the most heated topics in trustworthy AI. In such attacks, the attackers operate without any insider knowledge of the model, making the cross-model transferability of adversarial examples critical. Despite the potential for adversarial examples to be effective across various models, it has been observed that adversarial examples that are specifically crafted for a specific model often exhibit poor transferability. In this paper, we explore the transferability of adversarial examples via the lens of Bayesian approach. Specifically, we leverage Bayesian approach to probe the transferability and then study what constitutes a transferability-promoting prior. Following this, we design two concrete transferability-promoting priors, along with an adaptive dynamic weighting strategy for instances sampled from these priors. Employing these techniques, we present BayAtk. Extensive experiments illustrate the significant effectiveness of BayAtk in crafting more transferable adversarial examples against both undefended and defended black-box models compared to existing state-of-the-art attacks.