CVAug 22, 2019

Saliency Methods for Explaining Adversarial Attacks

arXiv:1908.08413v435 citations

Originality Incremental advance

AI Analysis

This work addresses the need for better interpretability in adversarial machine learning, though it is incremental as it builds on existing saliency methods.

The paper tackled the problem of explaining adversarial attacks on neural networks using saliency methods, showing that Guided Backpropagation contains class-discriminative information and proposing an enhanced version that achieves state-of-the-art performance.

The classification decisions of neural networks can be misled by small imperceptible perturbations. This work aims to explain the misled classifications using saliency methods. The idea behind saliency methods is to explain the classification decisions of neural networks by creating so-called saliency maps. Unfortunately, a number of recent publications have shown that many of the proposed saliency methods do not provide insightful explanations. A prominent example is Guided Backpropagation (GuidedBP), which simply performs (partial) image recovery. However, our numerical analysis shows the saliency maps created by GuidedBP do indeed contain class-discriminative information. We propose a simple and efficient way to enhance the saliency maps. The proposed enhanced GuidedBP shows the state-of-the-art performance to explain adversary classifications.

View on arXiv PDF

Similar