LGHCMLDec 4, 2020

Learning Interpretable Concept-Based Models with Human Feedback

arXiv:2012.02898v128 citations
AI Analysis

This work provides a more efficient and intuitive way for users to define concepts in high-dimensional tabular data, which is crucial for improving the interpretability and transparency of machine learning models in domains like clinical decision-making.

The paper addresses the challenge of learning interpretable concept-based models from high-dimensional tabular data. It proposes an approach where users label concept features rather than individual instances, resulting in concepts that align with user intuition and facilitate transparent prediction. This method is shown to be more efficient at learning ground truth concept definitions compared to alternative transparent approaches, while maintaining similar predictive performance.

Machine learning models that first learn a representation of a domain in terms of human-understandable concepts, then use it to make predictions, have been proposed to facilitate interpretation and interaction with models trained on high-dimensional data. However these methods have important limitations: the way they define concepts are not inherently interpretable, and they assume that concept labels either exist for individual instances or can easily be acquired from users. These limitations are particularly acute for high-dimensional tabular features. We propose an approach for learning a set of transparent concept definitions in high-dimensional tabular data that relies on users labeling concept features instead of individual instances. Our method produces concepts that both align with users' intuitive sense of what a concept means, and facilitate prediction of the downstream label by a transparent machine learning model. This ensures that the full model is transparent and intuitive, and as predictive as possible given this constraint. We demonstrate with simulated user feedback on real prediction problems, including one in a clinical domain, that this kind of direct feedback is much more efficient at learning solutions that align with ground truth concept definitions than alternative transparent approaches that rely on labeling instances or other existing interaction mechanisms, while maintaining similar predictive performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes