Sparse Subspace Clustering for Concept Discovery (SSCCD)
This work addresses the need for concept-based explanations in AI interpretability, offering a method to uncover hidden model behaviors beyond local attribution methods, though it appears incremental as it adapts existing clustering techniques to a new context.
The paper tackles the problem of identifying coherent model behavior across samples in explainable AI by defining concepts as low-dimensional subspaces of hidden feature layers and applying sparse subspace clustering to discover them, empirically demonstrating soundness on various image classification tasks.
Concepts are key building blocks of higher level human understanding. Explainable AI (XAI) methods have shown tremendous progress in recent years, however, local attribution methods do not allow to identify coherent model behavior across samples and therefore miss this essential component. In this work, we study concept-based explanations and put forward a new definition of concepts as low-dimensional subspaces of hidden feature layers. We novelly apply sparse subspace clustering to discover these concept subspaces. Moving forward, we derive insights from concept subspaces in terms of localized input (concept) maps, show how to quantify concept relevances and lastly, evaluate similarities and transferability between concepts. We empirically demonstrate the soundness of the proposed Sparse Subspace Clustering for Concept Discovery (SSCCD) method for a variety of different image classification tasks. This approach allows for deeper insights into the actual model behavior that would remain hidden from conventional input-level heatmaps.