Detecting Adversarial Examples through Nonlinear Dimensionality Reduction
This addresses security concerns in AI systems for applications like image recognition, but it is incremental as it focuses on non-adaptive attackers and plans future work for adaptive ones.
The paper tackles the problem of deep neural networks being vulnerable to adversarial examples by proposing a detection method using nonlinear dimensionality reduction and density estimation, with empirical results showing effective detection against non-adaptive attackers.
Deep neural networks are vulnerable to adversarial examples, i.e., carefully-perturbed inputs aimed to mislead classification. This work proposes a detection method based on combining non-linear dimensionality reduction and density estimation techniques. Our empirical findings show that the proposed approach is able to effectively detect adversarial examples crafted by non-adaptive attackers, i.e., not specifically tuned to bypass the detection method. Given our promising results, we plan to extend our analysis to adaptive attackers in future work.