LGCRMay 24, 2022

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

arXiv:2205.11819v130 citationsh-index: 81Has Code
Originality Highly original
AI Analysis

This work addresses security vulnerabilities in AI systems by providing a novel, data-efficient method for identifying backdoor attacks, which is crucial for deploying trustworthy models in real-world applications.

The paper tackles the problem of detecting Trojan attacks in deep neural networks by leveraging sparsity through network pruning, achieving high detection accuracy across multiple datasets and architectures without needing clean training data.

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse subnetworks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. Extensive experiments on various datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, with different network architectures, i.e., VGG-16, ResNet-18, ResNet-20s, and DenseNet-100 demonstrate the effectiveness of our proposal. Codes are available at https://github.com/VITA-Group/Backdoor-LTH.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes