TESDA: Transform Enabled Statistical Detection of Attacks in Deep Neural Networks
This addresses security vulnerabilities in DNNs for safety-critical applications, offering a practical and flexible detection approach without requiring dedicated hardware or Trojan triggers.
The paper tackles the problem of detecting attacks in deep neural networks by introducing TESDA, a method that exploits discrepancies in intermediate layer feature distributions, achieving detection coverages above 95% with overheads as low as 1-2%.
Deep neural networks (DNNs) are now the de facto choice for computer vision tasks such as image classification. However, their complexity and "black box" nature often renders the systems they're deployed in vulnerable to a range of security threats. Successfully identifying such threats, especially in safety-critical real-world applications is thus of utmost importance, but still very much an open problem. We present TESDA, a low-overhead, flexible, and statistically grounded method for {online detection} of attacks by exploiting the discrepancies they cause in the distributions of intermediate layer features of DNNs. Unlike most prior work, we require neither dedicated hardware to run in real-time, nor the presence of a Trojan trigger to detect discrepancies in behavior. We empirically establish our method's usefulness and practicality across multiple architectures, datasets and diverse attacks, consistently achieving detection coverages of above 95% with operation count overheads as low as 1-2%.