LGAICVHCApr 4, 2024

Explaining Explainability: Recommendations for Effective Use of Concept Activation Vectors

arXiv:2404.03713v212 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses challenges in making concept-based explanations more reliable for users of deep learning models, though it is incremental as it builds on existing CAV methods.

The paper investigates three properties of Concept Activation Vectors (CAVs) that affect interpretability—inconsistency, entanglement, and spatial dependency—and provides tools and recommendations to mitigate misleading explanations, demonstrating on a melanoma classification task that entanglement can cause uninterpretable results and negative probe set choice impacts CAV meaning.

Concept-based explanations translate the internal representations of deep learning models into a language that humans are familiar with: concepts. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs: (1) inconsistency across layers, (2) entanglement with other concepts, and (3) spatial dependency. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how each property can lead to misleading explanations, and provide recommendations to mitigate their impact. To demonstrate practical applications, we apply our recommendations to a melanoma classification task, showing how entanglement can lead to uninterpretable results and that the choice of negative probe set can have a substantial impact on the meaning of a CAV. Further, we show that understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on natural images (ImageNet), skin lesions (ISIC 2019), and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes