Identifying Classes Susceptible to Adversarial Attacks
This work addresses the challenge of improving adversarial robustness for image classification systems, but it is incremental as it focuses on identifying susceptible classes rather than providing a new defense method.
The paper tackles the problem of identifying which classes in deep learning image classifiers are most vulnerable to adversarial attacks, using distance-based measures to map original to adversarial classes and reduce model randomness, with experiments on MNIST, Fashion MNIST, and CIFAR-10 datasets.
Despite numerous attempts to defend deep learning based image classifiers, they remain susceptible to the adversarial attacks. This paper proposes a technique to identify susceptible classes, those classes that are more easily subverted. To identify the susceptible classes we use distance-based measures and apply them on a trained model. Based on the distance among original classes, we create mapping among original classes and adversarial classes that helps to reduce the randomness of a model to a significant amount in an adversarial setting. We analyze the high dimensional geometry among the feature classes and identify the k most susceptible target classes in an adversarial attack. We conduct experiments using MNIST, Fashion MNIST, CIFAR-10 (ImageNet and ResNet-32) datasets. Finally, we evaluate our techniques in order to determine which distance-based measure works best and how the randomness of a model changes with perturbation.