Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions
This work addresses interpretability and robustness issues in AI systems for domains like image analysis, offering a sample-efficient method to improve concept learning without costly interventions, though it is incremental as it builds on existing CBM and causal representation learning approaches.
The paper tackles the problem of spurious correlations in Concept Bottleneck Models (CBMs) by proposing a framework that learns interpretable concepts from high-dimensional data without requiring interventions, using causal representation learning and providing theoretical guarantees on concept correctness and label efficiency. In evaluations on synthetic and image benchmarks, the learned concepts show less impurities and often higher accuracy than other CBMs, even with strong concept correlations.
Machine learning is a vital part of many real-world systems, but several concerns remain about the lack of interpretability, explainability and robustness of black-box AI systems. Concept Bottleneck Models (CBM) address some of these challenges by learning interpretable concepts from high-dimensional data, e.g. images, which are used to predict labels. An important issue in CBMs are spurious correlation between concepts, which effectively lead to learning "wrong" concepts. Current mitigating strategies have strong assumptions, e.g., they assume that the concepts are statistically independent of each other, or require substantial interaction in terms of both interventions and labels provided by annotators. In this paper, we describe a framework that provides theoretical guarantees on the correctness of the learned concepts and on the number of required labels, without requiring any interventions. Our framework leverages causal representation learning (CRL) methods to learn latent causal variables from high-dimensional observations in a unsupervised way, and then learns to align these variables with interpretable concepts with few concept labels. We propose a linear and a non-parametric estimator for this mapping, providing a finite-sample high probability result in the linear case and an asymptotic consistency result for the non-parametric estimator. We evaluate our framework in synthetic and image benchmarks, showing that the learned concepts have less impurities and are often more accurate than other CBMs, even in settings with strong correlations between concepts.