LGFeb 1, 2022

Exploring layerwise decision making in DNNs

arXiv:2202.00345v1
AI Analysis

This work addresses interpretability for researchers and practitioners using DNNs, but it is incremental as it builds on existing feature attribution methods.

The paper tackled the problem of interpreting deep neural networks by extracting decision trees from each layer of a ReLU-activated MLP using binary encodings of activations, and combined these with feature attribution techniques to analyze layerwise behavior and sample groupings.

While deep neural networks (DNNs) have become a standard architecture for many machine learning tasks, their internal decision-making process and general interpretability is still poorly understood. Conversely, common decision trees are easily interpretable and theoretically well understood. We show that by encoding the discrete sample activation values of nodes as a binary representation, we are able to extract a decision tree explaining the classification procedure of each layer in a ReLU-activated multilayer perceptron (MLP). We then combine these decision trees with existing feature attribution techniques in order to produce an interpretation of each layer of a model. Finally, we provide an analysis of the generated interpretations, the behaviour of the binary encodings and how these relate to sample groupings created during the training process of the neural network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes