LGAICVNov 11, 2021

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

arXiv:2111.06206v643 citations
Originality Incremental advance
AI Analysis

This provides a method to interpret DNNs for researchers and practitioners, but it is incremental as it builds on existing concept-based explanation approaches.

The paper tackles the problem of explaining deep neural networks (DNNs) by showing that their inference scores can be disentangled into effects from a few interactive concepts, which are represented as a sparse causal graph, and it proves this graph can mimic DNN outputs on an exponential number of masked samples.

This paper aims to illustrate the concept-emerging phenomenon in a trained DNN. Specifically, we find that the inference score of a DNN can be disentangled into the effects of a few interactive concepts. These concepts can be understood as causal patterns in a sparse, symbolic causal graph, which explains the DNN. The faithfulness of using such a causal graph to explain the DNN is theoretically guaranteed, because we prove that the causal graph can well mimic the DNN's outputs on an exponential number of different masked samples. Besides, such a causal graph can be further simplified and re-written as an And-Or graph (AOG), without losing much explanation accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes