CRAICVLGOct 27, 2022

Rethinking the Reverse-engineering of Trojan Triggers

arXiv:2210.15127v166 citationsh-index: 15Has Code
Originality Incremental advance
AI Analysis

This addresses a security vulnerability in AI systems, offering a defense against both input-space and feature-space Trojan attacks, with incremental improvements over existing methods.

The paper tackles the problem of detecting and mitigating Trojan attacks in deep neural networks by proposing a reverse-engineering method that exploits feature space constraints, achieving an average detection accuracy of 93% and reducing the attack success rate to 0.26% while maintaining benign accuracy.

Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes. Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93\%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26\% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/FeatureRE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes