Enhance transferability of adversarial examples with model architecture
This work addresses the challenge of poor transferability in black-box adversarial attacks, which is critical for security testing but often incremental in approach.
The paper tackles the problem of adversarial example overfitting to proxy models in black-box attacks by proposing a multi-track model architecture (MMA) that reduces reliance on model-specific features, resulting in up to 40% higher transferability compared to state-of-the-art methods.
Transferability of adversarial examples is of critical importance to launch black-box adversarial attacks, where attackers are only allowed to access the output of the target model. However, under such a challenging but practical setting, the crafted adversarial examples are always prone to overfitting to the proxy model employed, presenting poor transferability. In this paper, we suggest alleviating the overfitting issue from a novel perspective, i.e., designing a fitted model architecture. Specifically, delving the bottom of the cause of poor transferability, we arguably decompose and reconstruct the existing model architecture into an effective model architecture, namely multi-track model architecture (MMA). The adversarial examples crafted on the MMA can maximumly relieve the effect of model-specified features to it and toward the vulnerable directions adopted by diverse architectures. Extensive experimental evaluation demonstrates that the transferability of adversarial examples based on the MMA significantly surpass other state-of-the-art model architectures by up to 40% with comparable overhead.