LGAISTMLFeb 14, 2024

Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

arXiv:2402.09236v236 citationsh-index: 23
AI Analysis

This work addresses the challenge of interpretability in machine learning for researchers and practitioners, though it appears incremental as it builds on existing fields.

The paper tackles the problem of learning human-interpretable concepts by unifying causal representation learning and foundation models, showing that concepts can be provably recovered from diverse data with experiments on synthetic data and large language models.

To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes