LGAIMLFeb 11, 2018

Influence-Directed Explanations for Deep Convolutional Networks

arXiv:1802.03788v280 citations
Originality Incremental advance
AI Analysis

This provides a method for interpreting complex deep learning models, which is an incremental advancement in explainable AI for researchers and practitioners.

The paper tackles the problem of explaining behavioral properties of deep neural networks by developing an influence-directed approach that identifies influential neurons and interprets their concepts, demonstrating capabilities such as identifying generalizable concepts, extracting class essences, and isolating decision features in ImageNet-trained CNNs.

We study the problem of explaining a rich class of behavioral properties of deep neural networks. Distinctively, our influence-directed explanations approach this problem by peering inside the network to identify neurons with high influence on a quantity and distribution of interest, using an axiomatically-justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by demonstrating a number of its unique capabilities on convolutional neural networks trained on ImageNet. Our evaluation demonstrates that influence-directed explanations (1) identify influential concepts that generalize across instances, (2) can be used to extract the "essence" of what the network learned about a class, and (3) isolate individual features the network uses to make decisions and distinguish related classes.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes