LGCVOct 4, 2021

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

arXiv:2110.01471v233 citations
Originality Incremental advance
AI Analysis

This addresses the need for more interpretable AI models, particularly for users in fields requiring transparency, though it is incremental as it builds on existing predictive information concepts.

The paper tackles the problem of explaining black-box neural networks by identifying which input features contain predictive information, proposing a method that works directly on the input domain and is architecture-agnostic, resulting in fine-grained feature identification.

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes