Diverse Concept Proposals for Concept Bottleneck Models
This work addresses interpretability issues in concept bottleneck models for domains like healthcare, offering a way to reconcile predictive accuracy with expert expectations, though it is incremental as it builds on existing models by adding diversity in concept proposals.
The paper tackles the challenge of learning human-interpretable concepts in concept bottleneck models, where predictive concepts may not align with expert intuition, by proposing an approach that identifies multiple predictive concepts to allow experts to choose the best-fitting explanation. The result shows the method discovered all possible concept representations on synthetic data and identified 4 out of 5 pre-defined concepts on EHR data without supervision.
Concept bottleneck models are interpretable predictive models that are often used in domains where model trust is a key priority, such as healthcare. They identify a small number of human-interpretable concepts in the data, which they then use to make predictions. Learning relevant concepts from data proves to be a challenging task. The most predictive concepts may not align with expert intuition, thus, failing interpretability with no recourse. Our proposed approach identifies a number of predictive concepts that explain the data. By offering multiple alternative explanations, we allow the human expert to choose the one that best aligns with their expectation. To demonstrate our method, we show that it is able discover all possible concept representations on a synthetic dataset. On EHR data, our model was able to identify 4 out of the 5 pre-defined concepts without supervision.