LGCVDec 23, 2020

Analyzing Representations inside Convolutional Neural Networks

arXiv:2012.12516v12 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of interpreting the internal representations of neural networks, which is crucial for applications like medical diagnosis.

This paper proposes an unsupervised framework to categorize concepts learned by a neural network by clustering input examples, neurons, and input features in a shared latent space. The method successfully extracts human-understandable and coherent concepts from a ResNet-18 trained on CIFAR-100.

How can we discover and succinctly summarize the concepts that a neural network has learned? Such a task is of great importance in applications of networks in areas of inference that involve classification, like medical diagnosis based on fMRI/x-ray etc. In this work, we propose a framework to categorize the concepts a network learns based on the way it clusters a set of input examples, clusters neurons based on the examples they activate for, and input features all in the same latent space. This framework is unsupervised and can work without any labels for input features, it only needs access to internal activations of the network for each input example, thereby making it widely applicable. We extensively evaluate the proposed method and demonstrate that it produces human-understandable and coherent concepts that a ResNet-18 has learned on the CIFAR-100 dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes