Towards Compositionality in Concept Learning
This work addresses the need for better interpretability in foundation models by enhancing concept compositionality, though it is incremental as it builds on existing concept-based methods.
The paper tackles the problem of non-compositional concept representations in unsupervised concept extraction methods for interpretability, proposing Compositional Concept Extraction (CCE) which finds more compositional concepts and improves accuracy on four downstream classification tasks across five datasets.
Concept-based interpretability methods offer a lens into the internals of foundation models by decomposing their embeddings into high-level concepts. These concept representations are most useful when they are compositional, meaning that the individual concepts compose to explain the full sample. We show that existing unsupervised concept extraction methods find concepts which are not compositional. To automatically discover compositional concept representations, we identify two salient properties of such representations, and propose Compositional Concept Extraction (CCE) for finding concepts which obey these properties. We evaluate CCE on five different datasets over image and text data. Our evaluation shows that CCE finds more compositional concept representations than baselines and yields better accuracy on four downstream classification tasks. Code and data are available at https://github.com/adaminsky/compositional_concepts .