CLCRMar 25, 2024

Task-Agnostic Detector for Insertion-Based Backdoor Attacks

arXiv:2403.17155v142 citationsh-index: 9NAACL-HLT
Originality Incremental advance
AI Analysis

This addresses security threats in NLP by providing a unified detection method for tasks like question answering and named entity recognition, though it is incremental as it builds on existing detection concepts.

The paper tackles the problem of detecting textual backdoor attacks across multiple NLP tasks, introducing TABDet, a task-agnostic method that achieves superior detection efficacy over traditional task-specific approaches.

Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection. TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks. TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes