FastWordBug: A Fast Method To Generate Adversarial Text Against NLP Applications
This addresses the vulnerability of NLP applications to adversarial attacks, offering a fast and efficient solution for security testing, though it is incremental as it builds on existing perturbation methods.
The paper tackles the problem of generating adversarial text perturbations efficiently in a black-box setting, resulting in a method that significantly reduces model accuracy while minimizing model calls, with demonstrated effectiveness on real-world datasets and cloud services.
In this paper, we present a novel algorithm, FastWordBug, to efficiently generate small text perturbations in a black-box setting that forces a sentiment analysis or text classification mode to make an incorrect prediction. By combining the part of speech attributes of words, we propose a scoring method that can quickly identify important words that affect text classification. We evaluate FastWordBug on three real-world text datasets and two state-of-the-art machine learning models under black-box setting. The results show that our method can significantly reduce the accuracy of the model, and at the same time, we can call the model as little as possible, with the highest attack efficiency. We also attack two popular real-world cloud services of NLP, and the results show that our method works as well.