Anomaly Detection of Adversarial Examples using Class-conditional Generative Adversarial Networks
This addresses the vulnerability of DNNs to adversarial attacks, which is a critical security issue for AI systems, but the approach is incremental as it builds on existing GAN-based methods.
The paper tackles the problem of detecting adversarial examples in deep neural networks by proposing an unsupervised detector using class-conditional GANs, and it shows that this method outperforms previous detection methods in experiments on image datasets under various attacks.
Deep Neural Networks (DNNs) have been shown vulnerable to Test-Time Evasion attacks (TTEs, or adversarial examples), which, by making small changes to the input, alter the DNN's decision. We propose an unsupervised attack detector on DNN classifiers based on class-conditional Generative Adversarial Networks (GANs). We model the distribution of clean data conditioned on the predicted class label by an Auxiliary Classifier GAN (AC-GAN). Given a test sample and its predicted class, three detection statistics are calculated based on the AC-GAN Generator and Discriminator. Experiments on image classification datasets under various TTE attacks show that our method outperforms previous detection methods. We also investigate the effectiveness of anomaly detection using different DNN layers (input features or internal-layer features) and demonstrate, as one might expect, that anomalies are harder to detect using features closer to the DNN's output layer.