Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples
This work addresses the need for reproducible benchmarks in adversarial NLP research to clarify the impact of search algorithms versus search spaces, which is incremental but important for improving attack evaluations and adversarial training.
The paper tackles the problem of evaluating black-box search algorithms for generating NLP adversarial examples by benchmarking them across different search spaces and budgets, finding that greedy attacks with word importance ranking are best under time constraints or for long inputs, while beam search or particle swarm optimization perform better otherwise.
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search algorithms are proposed in past work, the attack search space is often modified alongside the search algorithm. Without ablation studies benchmarking the search algorithm change with the search space held constant, one cannot tell if an increase in attack success rate is a result of an improved search algorithm or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms' run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise. Code implementation shared via https://github.com/QData/TextAttack-Search-Benchmark