CVJan 8, 2019

Interpretable BoW Networks for Adversarial Example Detection

arXiv:1901.02229v15 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretability and security in deep learning, particularly for detecting adversarial attacks, though it is incremental as it builds on existing BoW and GAN concepts.

The paper tackles the problem of interpreting CNN predictions and detecting adversarial examples by introducing an interpretable Bag of Words network that associates visual and semantic meaning to codewords, and uses this to outperform state-of-the-art adversarial detection methods across various attack strategies.

The standard approach to providing interpretability to deep convolutional neural networks (CNNs) consists of visualizing either their feature maps, or the image regions that contribute the most to the prediction. In this paper, we introduce an alternative strategy to interpret the results of a CNN. To this end, we leverage a Bag of visual Word representation within the network and associate a visual and semantic meaning to the corresponding codebook elements via the use of a generative adversarial network. The reason behind the prediction for a new sample can then be interpreted by looking at the visual representation of the most highly activated codeword. We then propose to exploit our interpretable BoW networks for adversarial example detection. To this end, we build upon the intuition that, while adversarial samples look very similar to real images, to produce incorrect predictions, they should activate codewords with a significantly different visual representation. We therefore cast the adversarial example detection problem as that of comparing the input image with the most highly activated visual codeword. As evidenced by our experiments, this allows us to outperform the state-of-the-art adversarial example detection methods on standard benchmarks, independently of the attack strategy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes