Locally optimal detection of stochastic targeted universal adversarial perturbations
This work provides an improved method for detecting adversarial attacks, which is a critical security concern for users of deep learning image classifiers.
This paper addresses the vulnerability of deep learning image classifiers to small adversarial perturbations. The authors derive a locally optimal generalized likelihood ratio test (LO-GLRT) based detector for stochastic targeted universal adversarial perturbations (UAPs) and demonstrate its superior performance compared to other detection methods on popular image classification datasets.
Deep learning image classifiers are known to be vulnerable to small adversarial perturbations of input images. In this paper, we derive the locally optimal generalized likelihood ratio test (LO-GLRT) based detector for detecting stochastic targeted universal adversarial perturbations (UAPs) of the classifier inputs. We also describe a supervised training method to learn the detector's parameters, and demonstrate better performance of the detector compared to other detection methods on several popular image classification datasets.