LGAISep 23, 2023

Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

arXiv:2309.13256v156 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses security risks for users of few-shot learning with PLMs, representing an incremental improvement in defense mechanisms.

The paper tackles the vulnerability of pre-trained language models as few-shot learners to backdoor attacks, proposing MDP as a defense that leverages masking-sensitivity gaps to identify poisoned samples, achieving validated efficacy in empirical evaluations.

Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes