LGSep 7, 2021

Trojan Signatures in DNN Weights

arXiv:2109.02836v130 citations
Originality Incremental advance
AI Analysis

This addresses a critical security vulnerability in AI systems for practitioners, offering a lightweight detection method, though it is incremental as it builds on existing trojan attack research.

The paper tackles the problem of detecting trojan attacks in deep neural networks by analyzing the weight distribution of the final linear layer, achieving high effectiveness across various architectures, datasets, and trigger types without needing training data or expensive computations.

Deep neural networks have been shown to be vulnerable to backdoor, or trojan, attacks where an adversary has embedded a trigger in the network at training time such that the model correctly classifies all standard inputs, but generates a targeted, incorrect classification on any input which contains the trigger. In this paper, we present the first ultra light-weight and highly effective trojan detection method that does not require access to the training/test data, does not involve any expensive computations, and makes no assumptions on the nature of the trojan trigger. Our approach focuses on analysis of the weights of the final, linear layer of the network. We empirically demonstrate several characteristics of these weights that occur frequently in trojaned networks, but not in benign networks. In particular, we show that the distribution of the weights associated with the trojan target class is clearly distinguishable from the weights associated with other classes. Using this, we demonstrate the effectiveness of our proposed detection method against state-of-the-art attacks across a variety of architectures, datasets, and trigger types.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes