EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction
This addresses the need for explainable AI in text processing, offering an unsupervised method to avoid concept-level annotations, but it is incremental as it builds on existing interpretability approaches.
The authors tackled the problem of providing interpretable explanations for model predictions in text processing by proposing a self-interpretable model that predicts outputs based on automatically extracted binary concepts from text excerpts, demonstrating its relevance on text classification and multi-sentiment analysis tasks.
Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model's prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. The presence of a concept is decided from an excerpt i.e. a small sequence of consecutive words in the text. Relevant concepts for the prediction task at hand are automatically defined by our model, avoiding the need for concept-level annotations. To ease interpretability, we enforce that for each concept, the corresponding excerpts share similar semantics and are differentiable from each others. We experimentally demonstrate the relevance of our approach on text classification and multi-sentiment analysis tasks.