CVCRMar 10

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

arXiv:2603.09772v111.4h-index: 39
Predicted impact top 80% in CV · last 90 daysOriginality Highly original
AI Analysis

This reveals a critical vulnerability in current backdoor defenses for AI security, highlighting an incomplete trigger-centric view.

The paper tackles the problem of backdoor attacks in machine learning by showing that neutralizing known triggers is insufficient, as alternative triggers can activate the same backdoor, and it proves this theoretically and empirically, with defenses often leaving backdoors intact.

Current backdoor defenses assume that neutralizing a known trigger removes the backdoor. We show this trigger-centric view is incomplete: \emph{alternative triggers}, patterns perceptually distinct from training triggers, reliably activate the same backdoor. We estimate the alternative trigger backdoor direction in feature space by contrasting clean and triggered representations, and then develop a feature-guided attack that jointly optimizes target prediction and directional alignment. First, we theoretically prove that alternative triggers exist and are an inevitable consequence of backdoor training. Then, we verify this empirically. Additionally, defenses that remove training triggers often leave backdoors intact, and alternative triggers can exploit the latent backdoor feature-space. Our findings motivate defenses targeting backdoor directions in representation space rather than input-space triggers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes