CLMay 25, 2022

BITE: Textual Backdoor Attacks with Iterative Trigger Injection

ETH Zurich
arXiv:2205.12700v3248 citationsh-index: 42
Originality Incremental advance
AI Analysis

This work addresses security vulnerabilities in NLP systems for users relying on untrusted training data, representing an incremental improvement in backdoor attack methods.

The paper tackles the problem of designing stealthy and effective backdoor attacks in NLP systems by proposing BITE, which uses iterative trigger injection to achieve high attack success rates (significantly more effective than baselines) while maintaining decent stealthiness across four text classification datasets.

Backdoor attacks have become an emerging threat to NLP systems. By providing poisoned training data, the adversary can embed a "backdoor" into the victim model, which allows input instances satisfying certain textual patterns (e.g., containing a keyword) to be predicted as a target label of the adversary's choice. In this paper, we demonstrate that it is possible to design a backdoor attack that is both stealthy (i.e., hard to notice) and effective (i.e., has a high attack success rate). We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and a set of "trigger words". These trigger words are iteratively identified and injected into the target-label instances through natural word-level perturbations. The poisoned training data instruct the victim model to predict the target label on inputs containing trigger words, forming the backdoor. Experiments on four text classification datasets show that our proposed attack is significantly more effective than baseline methods while maintaining decent stealthiness, raising alarm on the usage of untrusted training data. We further propose a defense method named DeBITE based on potential trigger word removal, which outperforms existing methods in defending against BITE and generalizes well to handling other backdoor attacks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes