A Framework for Causal Concept-based Model Explanations
This work addresses the need for better post-hoc explainability in AI, particularly for non-interpretable models, but it appears incremental as it builds on existing concept-based and causal explanation methods.
The authors tackled the problem of generating understandable and faithful explanations for non-interpretable AI models by proposing a causal concept-based framework, resulting in example explanations using a proof-of-concept model on the CelebA dataset.
This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.