Adversarial Examples Detection beyond Image Space
This work addresses a critical security vulnerability in deep neural networks for applications like image recognition, though it is incremental as it builds on prior detection-based defenses.
The paper tackles the problem of detecting adversarial examples with extremely slight perturbations by proposing a two-stream architecture that analyzes both pixel artifacts and confidence artifacts, achieving superior performance over existing methods under oblivious attacks and demonstrating effectiveness against omniscient attacks.
Deep neural networks have been proved that they are vulnerable to adversarial examples, which are generated by adding human-imperceptible perturbations to images. To defend these adversarial examples, various detection based methods have been proposed. However, most of them perform poorly on detecting adversarial examples with extremely slight perturbations. By exploring these adversarial examples, we find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence. To detect both few-perturbation attacks and large-perturbation attacks, we propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts. The experimental results show that the proposed method outperforms the existing methods under oblivious attacks and is verified effective to defend omniscient attacks as well.