CRAICLOct 15, 2021

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

arXiv:2110.08247v2293 citationsHas Code
Originality Incremental advance
AI Analysis

This work highlights the increased security threat of backdoor attacks in deep learning, showing how simple modifications can enhance their harmfulness, which is incremental but impactful for AI security.

The paper tackles the problem of making textual backdoor attacks more harmful by introducing two simple tricks: adding an extra training task to distinguish poisoned and clean data, and using all clean training data without removal. Experimental results show these tricks significantly improve attack performance in tough situations like clean data fine-tuning, low-poisoning-rate, and label-consistent attacks.

Backdoor attacks are a kind of emergent security threat in deep learning. After being injected with a backdoor, a deep neural model will behave normally on standard inputs but give adversary-specified predictions once the input contains specific backdoor triggers. In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful. The first trick is to add an extra training task to distinguish poisoned and clean data during the training of the victim model, and the second one is to use all the clean training data rather than remove the original clean data corresponding to the poisoned data. These two tricks are universally applicable to different attack models. We conduct experiments in three tough situations including clean data fine-tuning, low-poisoning-rate, and label-consistent attacks. Experimental results show that the two tricks can significantly improve attack performance. This paper exhibits the great potential harmfulness of backdoor attacks. All the code and data can be obtained at \url{https://github.com/thunlp/StyleAttack}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes