CVAIOct 13, 2025

FACE: Faithful Automatic Concept Extraction

arXiv:2510.11675v15 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the issue of unfaithful concept-based explanations in deep neural networks for interpretability research, representing an incremental improvement over prior methods.

The paper tackled the problem of aligning automatically extracted concepts with a model's true decision-making process to improve explanation faithfulness, and the result was that FACE outperformed existing methods on faithfulness and sparsity metrics across ImageNet, COCO, and CelebA datasets.

Interpreting deep neural networks through concept-based explanations offers a bridge between low-level features and high-level human-understandable semantics. However, existing automatic concept discovery methods often fail to align these extracted concepts with the model's true decision-making process, thereby compromising explanation faithfulness. In this work, we propose FACE (Faithful Automatic Concept Extraction), a novel framework that augments Non-negative Matrix Factorization (NMF) with a Kullback-Leibler (KL) divergence regularization term to ensure alignment between the model's original and concept-based predictions. Unlike prior methods that operate solely on encoder activations, FACE incorporates classifier supervision during concept learning, enforcing predictive consistency and enabling faithful explanations. We provide theoretical guarantees showing that minimizing the KL divergence bounds the deviation in predictive distributions, thereby promoting faithful local linearity in the learned concept space. Systematic evaluations on ImageNet, COCO, and CelebA datasets demonstrate that FACE outperforms existing methods across faithfulness and sparsity metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes