CRLGJun 1, 2020

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

arXiv:2006.01043v2351 citations
AI Analysis

This addresses security vulnerabilities in NLP models for real-world applications, representing a novel extension of backdoor attacks from computer vision to NLP.

The authors tackled the problem of backdoor attacks in NLP models by proposing BadNL, a framework with three trigger construction methods (BadChar, BadWord, BadSentence) that achieve high attack success rates, such as 98.9% on the SST-5 dataset with only 3% poisoning, while preserving semantics and minimally affecting model utility.

Deep neural networks (DNNs) have progressed rapidly during the past decade and have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model's training set to mislead any input with an added secret trigger to a target class. Previous backdoor attacks predominantly focus on computer vision (CV) applications, such as image classification. In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set. Moreover, we conduct a user study to prove that our triggers can well preserve the semantics from humans perspective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes