Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery
This work addresses the problem of scalable concept discovery for post-hoc model explanations, offering a more efficient alternative to clustering methods for researchers and practitioners in interpretable AI.
The paper tackles the challenge of understanding which semantic information deep learning models rely on for predictions by proposing the Vector Quantized Latent Concept (VQLC) method, which improves scalability while maintaining comparable explanation quality to clustering-based approaches.
Deep Learning models encode rich semantic information in their hidden representations. However, it remains challenging to understand which parts of this information models actually rely on when making predictions. A promising line of post-hoc concept-based explanation methods relies on clustering token representations. However, commonly used approaches such as hierarchical clustering are computationally infeasible for large-scale datasets, and K-Means often yields shallow or frequency-dominated clusters. We propose the vector quantized latent concept (VQLC) method, a framework built upon the vector quantized-variational autoencoder (VQ-VAE) architecture that learns a discrete codebook mapping continuous representations to concept vectors. We perform thorough evaluations and show that VQLC improves scalability while maintaining comparable quality of human-understandable explanations.