LG CRJul 1, 2021

Using Anomaly Feature Vectors for Detecting, Classifying and Warning of Outlier Adversarial Examples

Nelson Manohar-Alers, Ryan Feng, Sahib Singh, Jiguo Song, Atul Prakash

arXiv:2107.00561v11.6

Originality Incremental advance

AI Analysis

This work addresses adversarial robustness for machine learning systems by enabling attack-specific mitigation, though it appears incremental as it builds on existing detection methods.

The paper tackles the problem of detecting and classifying adversarial attacks on neural networks, achieving close to 93% accuracy in distinguishing attack types like PGD and Carlini-Wagner on CIFAR-10.

We present DeClaW, a system for detecting, classifying, and warning of adversarial inputs presented to a classification neural network. In contrast to current state-of-the-art methods that, given an input, detect whether an input is clean or adversarial, we aim to also identify the types of adversarial attack (e.g., PGD, Carlini-Wagner or clean). To achieve this, we extract statistical profiles, which we term as anomaly feature vectors, from a set of latent features. Preliminary findings suggest that AFVs can help distinguish among several types of adversarial attacks (e.g., PGD versus Carlini-Wagner) with close to 93% accuracy on the CIFAR-10 dataset. The results open the door to using AFV-based methods for exploring not only adversarial attack detection but also classification of the attack type and then design of attack-specific mitigation strategies.

View on arXiv PDF

Similar