Potential adversarial samples for white-box attacks
This work addresses the robustness issue in deep learning classifiers by improving adversarial sample detection, though it is incremental as it builds on existing attack methods.
The paper tackles the vulnerability of deep convolutional neural networks to adversarial attacks by proposing a low-cost method to identify potential adversarial samples near decision boundaries, reducing the search space for attacks while maintaining high coverage of adversarial samples from iFGSM (82%) and DeepFool (92%) on CIFAR10.
Deep convolutional neural networks can be highly vulnerable to small perturbations of their inputs, potentially a major issue or limitation on system robustness when using deep networks as classifiers. In this paper we propose a low-cost method to explore marginal sample data near trained classifier decision boundaries, thus identifying potential adversarial samples. By finding such adversarial samples it is possible to reduce the search space of adversarial attack algorithms while keeping a reasonable successful perturbation rate. In our developed strategy, the potential adversarial samples represent only 61% of the test data, but in fact cover more than 82% of the adversarial samples produced by iFGSM and 92% of the adversarial samples successfully perturbed by DeepFool on CIFAR10.