CL AIMay 24

SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack

Han Liu, Zhi Xu, Xiaotong Zhang, Feng Zhang, Xiaoming Xu, Wei Wang, Fenglong Ma, Hong Yu

arXiv:2605.2495812.9

AI Analysis

This work addresses the under-explored problem of transferable adversarial attacks in text, which is crucial for assessing robustness of NLP models in black-box settings.

SEP-Attack proposes a new paradigm for transfer-based textual adversarial attacks using Determinantal Point Process to generate diverse ensemble weights, achieving significant improvements over state-of-the-art baselines on four datasets and two real-world APIs.

Despite the strong performance of deep neural networks in modern Web and language applications, they remain vulnerable to adversarial attacks, especially transferable attacks that generate adversarial examples using surrogate models without accessing the victim model. Transferable attacks in the text domain are still under-explored, with only a few studies addressing this challenging issue, often with suboptimal results due to equal treatment of submodels or inaccurate estimation of importance scores. To address these challenges, we propose a simple yet effective paradigm for transfer-based textual adversarial attack, named SEP-Attack. Specifically, we employ the Determinantal Point Process (DPP) to generate diverse surrogate ensemble weights, representing the transferability of submodels. Using these weights, we introduce a new metric to evaluate prediction confidence scores, which in turn are used to calculate word importance scores and generate adversarial candidates. Finally, we quantify the transferability score for each candidate and select the top ones as the final transferable adversarial examples. Experiments conducted on four datasets and two real-world APIs validate the efficacy of SEP-Attack, significantly outperforming state-of-the-art baselines.

View on arXiv PDF

Similar