Comprehensible Convolutional Neural Networks via Guided Concept Learning
This work addresses the problem of improving the transparency and trustworthiness of Deep Neural Networks for end-users by learning human-perceivable concepts.
This paper proposes a guided learning approach with an additional concept layer in a CNN to learn associations between visual features and word phrases. The model learns concepts consistent with human perception and their contributions to decisions without compromising accuracy, and these concepts are transferable to new object classes.
Learning concepts that are consistent with human perception is important for Deep Neural Networks to win end-user trust. Post-hoc interpretation methods lack transparency in the feature representations learned by the models. This work proposes a guided learning approach with an additional concept layer in a CNN- based architecture to learn the associations between visual features and word phrases. We design an objective function that optimizes both prediction accuracy and semantics of the learned feature representations. Experiment results demonstrate that the proposed model can learn concepts that are consistent with human perception and their corresponding contributions to the model decision without compromising accuracy. Further, these learned concepts are transferable to new classes of objects that have similar concepts.