LGCVDec 21, 2020

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

arXiv:2012.11212v2192 citations
AI Analysis

This research addresses the vulnerability of deep neural networks to trojan attacks for users who deploy pre-trained models, proposing a more sophisticated attack that is harder to detect.

This paper introduces a novel deep feature space trojan attack that allows for controlled detoxification of neural networks. The attack is effective, stealthy, controllable, and robust, demonstrating its ability to evade state-of-the-art defense mechanisms across 9 image classifiers and various datasets including ImageNet.

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes