CVNov 25, 2019

Improving Feature Attribution through Input-specific Network Pruning

arXiv:1911.11081v213 citations
Originality Incremental advance
AI Analysis

This work addresses the need for clearer feature attribution in neural networks, which is crucial for interpretability in AI applications, though it is incremental as it builds on existing pruning and attribution techniques.

The paper tackles the problem of noisy or coarse gradient-based attribution in neural networks by proposing input-specific pruning to retain only highly contributing neurons, resulting in fine-grained attribution maps that outperform other methods across multiple benchmarks.

Attributing the output of a neural network to the contribution of given input elements is a way of shedding light on the black-box nature of neural networks. Due to the complexity of current network architectures, current gradient-based attribution methods provide very noisy or coarse results. We propose to prune a neural network for a given single input to keep only neurons that highly contribute to the prediction. We show that by input-specific pruning, network gradients change from reflecting local (noisy) importance information to global importance. Our proposed method is efficient and generates fine-grained attribution maps. We further provide a theoretical justification of the pruning approach relating it to perturbations and validate it through a novel experimental setup. Our method is evaluated by multiple benchmarks: sanity checks, pixel perturbation, and Remove-and-Retrain (ROAR). These benchmarks evaluate the method from different perspectives and our method performs better than other methods across all evaluations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes