LGAICVApr 15, 2025

Measuring the (Un)Faithfulness of Concept-Based Explanations

arXiv:2504.10833v3h-index: 5
Originality Incremental advance
AI Analysis

This addresses the issue of misleading evaluations in interpretable AI for researchers and practitioners, highlighting that prior methods artificially inflate faithfulness scores, which is incremental but crucial for reliable benchmarking.

The paper tackles the problem of evaluating the faithfulness of concept-based explanation methods (CBEMs) for deep vision models, revealing that many state-of-the-art unsupervised CBEMs are not faithful despite claims, and proposes SURF, a new method that uses a simple linear surrogate and improved metrics to reliably benchmark faithfulness.

Deep vision models perform input-output computations that are hard to interpret. Concept-based explanation methods (CBEMs) increase interpretability by re-expressing parts of the model with human-understandable semantic units, or concepts. Checking if the derived explanations are faithful -- that is, they represent the model's internal computation -- requires a surrogate that combines concepts to compute the output. Simplifications made for interpretability inevitably reduce faithfulness, resulting in a tradeoff between the two. State-of-the-art unsupervised CBEMs (U-CBEMs) have reported increasingly interpretable concepts, while also being more faithful to the model. However, we observe that the reported improvement in faithfulness artificially results from either (1) using overly complex surrogates, which introduces an unmeasured cost to the explanation's interpretability, or (2) relying on deletion-based approaches that, as we demonstrate, do not properly measure faithfulness. We propose Surrogate Faithfulness (SURF), which (1) replaces prior complex surrogates with a simple, linear surrogate that measures faithfulness without changing the explanation's interpretability and (2) introduces well-motivated metrics that assess loss across all output classes, not just the predicted class. We validate SURF with a measure-over-measure study by proposing a simple sanity check -- explanations with random concepts should be less faithful -- which prior surrogates fail. SURF enables the first reliable faithfulness benchmark of U-CBEMs, revealing that many visually compelling U-CBEMs are not faithful. Code to be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes