Toward Robust Image Classification
This work addresses robustness in image classification for security-critical applications, but it is incremental as it builds on existing techniques.
The paper tackled the problem of adversarial image misclassification in neural networks by implementing a detection model combining dropout randomization and preprocessing within Bayesian uncertainty, achieving 97% average detection accuracy and 99% classification accuracy after discarding flagged images on MNIST.
Neural networks are frequently used for image classification, but can be vulnerable to misclassification caused by adversarial images. Attempts to make neural network image classification more robust have included variations on preprocessing (cropping, applying noise, blurring), adversarial training, and dropout randomization. In this paper, we implemented a model for adversarial detection based on a combination of two of these techniques: dropout randomization with preprocessing applied to images within a given Bayesian uncertainty. We evaluated our model on the MNIST dataset, using adversarial images generated using Fast Gradient Sign Method (FGSM), Jacobian-based Saliency Map Attack (JSMA) and Basic Iterative Method (BIM) attacks. Our model achieved an average adversarial image detection accuracy of 97%, with an average image classification accuracy, after discarding images flagged as adversarial, of 99%. Our average detection accuracy exceeded that of recent papers using similar techniques.