LGOct 16, 2024

ConLUX: Concept-Based Local Unified Explanations

arXiv:2410.12439v14.61 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the need for better interpretability in machine learning models for users and developers, though it is incremental as it builds on existing local explanation techniques.

The paper tackles the problem of generating model-agnostic explanations that are faithful and understandable by proposing ConLUX, a framework that automatically extracts high-level concepts from pre-trained models to provide concept-based local explanations. The results show that ConLUX offers more faithful and understandable explanations than vanilla techniques and outperforms state-of-the-art concept-based methods for text and image models.

With the rapid advancements of various machine learning models, there is a significant demand for model-agnostic explanation techniques, which can explain these models across different architectures. Mainstream model-agnostic explanation techniques generate local explanations based on basic features (e.g., words for text models and (super-)pixels for image models). However, these explanations often do not align with the decision-making processes of the target models and end-users, resulting in explanations that are unfaithful and difficult for users to understand. On the other hand, concept-based techniques provide explanations based on high-level features (e.g., topics for text models and objects for image models), but most are model-specific or require additional pre-defined external concept knowledge. To address this limitation, we propose \toolname, a general framework to provide concept-based local explanations for any machine learning models. Our key insight is that we can automatically extract high-level concepts from large pre-trained models, and uniformly extend existing local model-agnostic techniques to provide unified concept-based explanations. We have instantiated \toolname on four different types of explanation techniques: LIME, Kernel SHAP, Anchor, and LORE, and applied these techniques to text and image models. Our evaluation results demonstrate that 1) compared to the vanilla versions, \toolname offers more faithful explanations and makes them more understandable to users, and 2) by offering multiple forms of explanations, \toolname outperforms state-of-the-art concept-based explanation techniques specifically designed for text and image models, respectively.

View on arXiv PDF

Similar