Closeness and Uncertainty Aware Adversarial Examples Detection in Adversarial Machine Learning
This work provides a robust defense mechanism against adversarial attacks, which is crucial for deploying Deep Neural Networks in security-critical applications.
This paper addresses the problem of detecting adversarial examples by combining moment-based predictive uncertainty estimates from Monte-Carlo Dropout Sampling with a novel method operating in the subspace of deep features. The combined approach achieves up to 99% ROC-AUC scores for adversarial example detection across various datasets and attack algorithms.
While state-of-the-art Deep Neural Network (DNN) models are considered to be robust to random perturbations, it was shown that these architectures are highly vulnerable to deliberately crafted perturbations, albeit being quasi-imperceptible. These vulnerabilities make it challenging to deploy DNN models in security-critical areas. In recent years, many research studies have been conducted to develop new attack methods and come up with new defense techniques that enable more robust and reliable models. In this work, we explore and assess the usage of different type of metrics for detecting adversarial samples. We first leverage the usage of moment-based predictive uncertainty estimates of a DNN classifier obtained using Monte-Carlo Dropout Sampling. And we also introduce a new method that operates in the subspace of deep features extracted by the model. We verified the effectiveness of our approach on a range of standard datasets like MNIST (Digit), MNIST (Fashion) and CIFAR-10. Our experiments show that these two different approaches complement each other, and the combined usage of all the proposed metrics yields up to 99 \% ROC-AUC scores regardless of the attack algorithm.