CR LGApr 27, 2023

ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger

Jiazhao Li, Yijin Yang, Zhuofeng Wu, V. G. Vinod Vydiswaran, Chaowei Xiao

arXiv:2304.14475v129.666 citationsh-index: 43

Originality Incremental advance

AI Analysis

This addresses a practical threat in AI security by making backdoor attacks harder to detect, though it is incremental as it builds on existing generative models.

The paper tackles the problem of textual backdoor attacks by proposing BGMAttack, which uses black-box generative models to create stealthy triggers that effectively deceive classifiers, achieving comparable attack performance with superior stealthiness across five datasets.

Textual backdoor attacks pose a practical threat to existing systems, as they can compromise the model by inserting imperceptible triggers into inputs and manipulating labels in the training dataset. With cutting-edge generative models such as GPT-4 pushing rewriting to extraordinary levels, such attacks are becoming even harder to detect. We conduct a comprehensive investigation of the role of black-box generative models as a backdoor attack tool, highlighting the importance of researching relative defense strategies. In this paper, we reveal that the proposed generative model-based attack, BGMAttack, could effectively deceive textual classifiers. Compared with the traditional attack methods, BGMAttack makes the backdoor trigger less conspicuous by leveraging state-of-the-art generative models. Our extensive evaluation of attack effectiveness across five datasets, complemented by three distinct human cognition assessments, reveals that Figure 4 achieves comparable attack performance while maintaining superior stealthiness relative to baseline methods.

View on arXiv PDF

Similar