LGAIJul 6, 2023

A Vulnerability of Attribution Methods Using Pre-Softmax Scores

arXiv:2307.03305v3h-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses a reliability issue in explainable AI for researchers and practitioners, but it is incremental as it builds on known vulnerabilities in neural networks.

The paper identifies a vulnerability in a category of attribution methods for convolutional neural network classifiers, where small modifications to the model can affect the explanations without changing the model outputs.

We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes