Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks
This addresses the challenge of interpreting neural network representations for researchers in explainable AI, though it is incremental in advancing distributed representation analysis.
The paper tackles the problem of understanding how semantic concepts are encoded in deep neural networks by showing that single-filter interpretations are often unrepresentative, and introduces Net2Vec to map concepts to vector embeddings from multiple filters, revealing that concepts typically require multiple filters and filters often encode multiple concepts.
In an effort to understand the meaning of the intermediate representations captured by deep networks, recent papers have tried to associate specific semantic concepts to individual neural network filter responses, where interesting correlations are often found, largely by focusing on extremal filter responses. In this paper, we show that this approach can favor easy-to-interpret cases that are not necessarily representative of the average behavior of a representation. A more realistic but harder-to-study hypothesis is that semantic representations are distributed, and thus filters must be studied in conjunction. In order to investigate this idea while enabling systematic visualization and quantification of multiple filter responses, we introduce the Net2Vec framework, in which semantic concepts are mapped to vectorial embeddings based on corresponding filter responses. By studying such embeddings, we are able to show that 1., in most cases, multiple filters are required to code for a concept, that 2., often filters are not concept specific and help encode multiple concepts, and that 3., compared to single filter activations, filter embeddings are able to better characterize the meaning of a representation and its relationship to other concepts.