LGSTTHMay 21

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

arXiv:2605.224818.8
Predicted impact top 60% in LG · last 90 daysOriginality Incremental advance
AI Analysis

Provides a theoretical foundation for understanding backdoor attacks in high dimensions, revealing counter-intuitive behaviors that challenge existing intuitions and can guide defense design.

The paper theoretically analyzes backdoor attacks in high-dimensional settings, showing that stronger training triggers can paradoxically improve clean accuracy and reduce attack success beyond a finite trigger strength. Key results include closed-form proofs for squared loss and extensions to general convex losses, with experiments on CIFAR-10 and ResNet-18 confirming the phenomena.

Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to κ$), varying the training trigger strength $α$ against a fixed test trigger. Three phenomena emerge: (i) clean test accuracy increases with $α$; (ii) attack success peaks at a finite $α$ and then declines; and (iii) the most damaging trigger direction is the minimum eigenvector of the data covariance. We prove all three results in closed form for the squared loss, and extend (i) and (ii) to general convex GLM losses via a Gaussian-proxy fixed-point system. We identify a finite-sample noise floor proportional to $κ$ as the mechanism behind (i), invisible to classical $n \gg p$ analysis. Experiments on CIFAR-10 and Gaussian surrogates match the theory closely; ResNet-18 experiments show the same phenomena beyond the convex setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes