Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks
This addresses the security vulnerability of Trojan attacks in AI systems, providing a robust defense across multiple domains, though it builds incrementally on prior detection methods.
The paper tackles the problem of detecting Trojan attacks in deep neural networks by introducing STRIP-ViTA, a multi-domain detection method that works across vision, text, and audio domains, achieving 0% false rejection and false acceptance rates for vision tasks and low error rates for text and audio tasks.
This work corroborates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs, is a multi-domain Trojan detection defence across Vision, Text and Audio domains---thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is the first confirmed Trojan detection method that is demonstratively independent of both the task domain and model architectures. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs, and a public third party Trojaned model for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on 28 tested Trojaned models demonstrate that STRIP-ViTA performs well across all nine architectures and five datasets. In general, STRIP-ViTA can effectively detect Trojan inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). In particular, for vision tasks, we can always achieve a 0% FRR and FAR. By setting FRR to be 3%, average FAR of 1.1% and 3.55% are achieved for text and audio tasks, respectively. Moreover, we have evaluated and shown the effectiveness of STRIP-ViTA against a number of advanced backdoor attacks whilst other state-of-the-art methods lose effectiveness in front of one or all of these advanced backdoor attacks.