LGCVMay 10, 2024

Linear Explanations for Individual Neurons

arXiv:2405.06855v119 citationsh-index: 24ICML
Originality Incremental advance
AI Analysis

This addresses the interpretability problem for AI researchers and practitioners by providing a more comprehensive neuron analysis, though it is incremental in improving existing explanation methods.

The paper tackles the problem of understanding individual neurons in neural networks by showing that focusing only on highest activations is insufficient, as they account for a small percentage of causal effect and lower activations differ unpredictably. It proposes linear explanations as a combination of concepts and develops an efficient method with automatic evaluation using simulation in vision settings.

In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron's causal effect. In addition, inputs causing lower activations are often very different and can't be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes