LGMLFeb 5, 2021

Interpretable Neural Networks based classifiers for categorical inputs

arXiv:2102.03202v1
Originality Incremental advance
AI Analysis

This work addresses the problem of interpreting neural network classifiers for human-sensitive applications, which is important for practitioners needing to understand model decisions.

This paper introduces a method to interpret neural network classifiers with categorical inputs by mapping them to a physical energy model. This allows for the expansion of network layers, especially the logits layer, into terms that quantify the contribution of each input pattern to the classification, considering linear and pairwise dependencies.

Because of the pervasive usage of Neural Networks in human sensitive applications, their interpretability is becoming an increasingly important topic in machine learning. In this work we introduce a simple way to interpret the output function of a neural network classifier that take as input categorical variables. By exploiting a mapping between a neural network classifier and a physical energy model, we show that in these cases each layer of the network, and the logits layer in particular, can be expanded as a sum of terms that account for the contribution to the classification of each input pattern. For instance, at the first order, the expansion considers just the linear relation between input features and output while at the second order pairwise dependencies between input features are also accounted for. The analysis of the contributions of each pattern, after an appropriate gauge transformation, is presented in two cases where the effectiveness of the method can be appreciated.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes