OrderBkd: Textual backdoor attack through repositioning
This addresses security threats in NLP for users of third-party datasets and models, but it is incremental as it builds on existing backdoor attack techniques with a novel trigger mechanism.
The paper tackles the problem of hidden backdoor attacks in NLP systems by proposing a method that repositions two words in a sentence as a trigger, achieving high attack success rates on SST-2 and AG datasets while outperforming existing attacks in perplexity and semantic similarity.
The use of third-party datasets and pre-trained machine learning models poses a threat to NLP systems due to possibility of hidden backdoor attacks. Existing attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing, which either alter the semantics of the original texts or can be detected. Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger. By designing and applying specific part-of-speech (POS) based rules for selecting these tokens, we maintain high attack success rate on SST-2 and AG classification datasets while outperforming existing attacks in terms of perplexity and semantic similarity to the clean samples. In addition, we show the robustness of our attack to the ONION defense method. All the code and data for the paper can be obtained at https://github.com/alekseevskaia/OrderBkd.