A Context-Aware Approach for Textual Adversarial Attack through Probability Difference Guided Beam Search
This work addresses the vulnerability of text classifiers to adversarial attacks, which is crucial for improving robustness in natural language processing, but it is incremental as it builds on existing context-aware methods.
The paper tackles the problem of limited attack efficiency in context-aware textual adversarial attacks by proposing PDBS, a model that uses probability difference guided beam search, resulting in up to a +19.5% increase in attack success rate compared to previous best models.
Textual adversarial attacks expose the vulnerabilities of text classifiers and can be used to improve their robustness. Existing context-aware methods solely consider the gold label probability and use the greedy search when searching an attack path, often limiting the attack efficiency. To tackle these issues, we propose PDBS, a context-aware textual adversarial attack model using Probability Difference guided Beam Search. The probability difference is an overall consideration of all class label probabilities, and PDBS uses it to guide the selection of attack paths. In addition, PDBS uses the beam search to find a successful attack path, thus avoiding suffering from limited search space. Extensive experiments and human evaluation demonstrate that PDBS outperforms previous best models in a series of evaluation metrics, especially bringing up to a +19.5% attack success rate. Ablation studies and qualitative analyses further confirm the efficiency of PDBS.