CVNov 18, 2020

Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs

arXiv:2011.09123v12.32 citations

Originality Incremental advance

AI Analysis

This work addresses the vulnerability of pre-trained CNNs to adversarial and out-of-distribution examples, which is a significant problem for deploying robust AI systems, particularly for those without access to diverse attack data.

This paper proposes a method to detect out-of-distribution and adversarial examples for pre-trained Convolutional Neural Networks without retraining or extensive fooling examples. The method creates adversarial profiles for each class using a single attack generation technique, achieving 92% detection of out-of-distribution examples and 59% of adversarial examples on the MNIST dataset.

Despite high accuracy of Convolutional Neural Networks (CNNs), they are vulnerable to adversarial and out-distribution examples. There are many proposed methods that tend to detect or make CNNs robust against these fooling examples. However, most such methods need access to a wide range of fooling examples to retrain the network or to tune detection parameters. Here, we propose a method to detect adversarial and out-distribution examples against a pre-trained CNN without needing to retrain the CNN or needing access to a wide variety of fooling examples. To this end, we create adversarial profiles for each class using only one adversarial attack generation technique. We then wrap a detector around the pre-trained CNN that applies the created adversarial profile to each input and uses the output to decide whether or not the input is legitimate. Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.

View on arXiv PDF

Similar