Extracting Interpretable Concept-Based Decision Trees from CNNs
This work addresses the need for interpretability in deep learning models, particularly for researchers and practitioners seeking human-in-the-loop understanding of discriminative concepts in CNNs, though it is incremental as it builds on existing concept extraction and decision tree methods.
The paper tackled the problem of interpreting CNN decisions by extracting human-understandable concepts from hidden layer activations and representing them via a shallow decision tree, achieving accurate representation of the original CNN's classifications at low tree depths.
In an attempt to gather a deeper understanding of how convolutional neural networks (CNNs) reason about human-understandable concepts, we present a method to infer labeled concept data from hidden layer activations and interpret the concepts through a shallow decision tree. The decision tree can provide information about which concepts a model deems important, as well as provide an understanding of how the concepts interact with each other. Experiments demonstrate that the extracted decision tree is capable of accurately representing the original CNN's classifications at low tree depths, thus encouraging human-in-the-loop understanding of discriminative concepts.