CVCRLGJul 25, 2019

How to Manipulate CNNs to Make Them Lie: the GradCAM Case

arXiv:1907.10901v230 citations
Originality Highly original
AI Analysis

This work raises security concerns for applications like medical diagnosis where model explanations are used for decision-making, highlighting a novel attack vector on interpretability methods.

The paper tackles the vulnerability of the GradCAM explanation method for CNNs by demonstrating that an adversary can manipulate the model's weights and architecture to generate any desired explanation without significantly affecting accuracy, and combines this with input manipulation to create a backdoor that triggers malicious explanations under specific conditions.

Recently many methods have been introduced to explain CNN decisions. However, it has been shown that some methods can be sensitive to manipulation of the input. We continue this line of work and investigate the explanation method GradCAM. Instead of manipulating the input, we consider an adversary that manipulates the model itself to attack the explanation. By changing weights and architecture, we demonstrate that it is possible to generate any desired explanation, while leaving the model's accuracy essentially unchanged. This illustrates that GradCAM cannot explain the decision of every CNN and provides a proof of concept showing that it is possible to obfuscate the inner workings of a CNN. Finally, we combine input and model manipulation. To this end we put a backdoor in the network: the explanation is correct unless there is a specific pattern present in the input, which triggers a malicious explanation. Our work raises new security concerns, especially in settings where explanations of models may be used to make decisions, such as in the medical domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes