CVCRLGOct 29, 2020

Can the state of relevant neurons in a deep neural networks serve as indicators for detecting adversarial attacks?

arXiv:2010.15974v11 citations
Originality Incremental advance
AI Analysis

This addresses the security issue of adversarial attacks for users of image classification models, but it is incremental as it builds on existing detection methods.

The paper tackles the problem of detecting adversarial attacks in deep neural networks by monitoring a sparse set of neurons relevant to the model's predicted classes, and it achieves comparable accuracy to state-of-the-art detectors in recognizing adversarial samples.

We present a method for adversarial attack detection based on the inspection of a sparse set of neurons. We follow the hypothesis that adversarial attacks introduce imperceptible perturbations in the input and that these perturbations change the state of neurons relevant for the concepts modelled by the attacked model. Therefore, monitoring the status of these neurons would enable the detection of adversarial attacks. Focusing on the image classification task, our method identifies neurons that are relevant for the classes predicted by the model. A deeper qualitative inspection of these sparse set of neurons indicates that their state changes in the presence of adversarial samples. Moreover, quantitative results from our empirical evaluation indicate that our method is capable of recognizing adversarial samples, produced by state-of-the-art attack methods, with comparable accuracy to that of state-of-the-art detectors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes