MLJun 5, 2022
Model-Agnostic Confidence Intervals for Feature Importance: A Fast and Powerful Approach Using Minipatch EnsemblesLuqin Gan, Lili Zheng, Genevera I. Allen
To promote new scientific discoveries from complex data sets, feature importance inference has been a long-standing statistical problem. Instead of testing for parameters that are only interpretable for specific models, there has been increasing interest in model-agnostic methods, often in the form of feature occlusion or leave-one-covariate-out (LOCO) inference. Existing approaches often make distributional assumptions, which can be difficult to verify in practice, or require model refitting and data splitting, which are computationally intensive and lead to losses in power. In this work, we develop a novel, mostly model-agnostic and distribution-free inference framework for feature importance that is computationally efficient and statistically powerful. Our approach is fast as we avoid model refitting by leveraging a form of random observation and feature subsampling called minipatch ensembles; this approach also improves statistical power by avoiding data splitting. Our framework can be applied on tabular data and with any machine learning algorithm, together with minipatch ensembles, for regression and classification tasks. Despite the dependencies induced by using minipatch ensembles, we show that our approach provides asymptotic coverage for the feature importance score of any model under mild assumptions. Finally, our same procedure can also be leveraged to provide valid confidence intervals for predictions, hence providing fast, simultaneous quantification of the uncertainty of both predictions and feature importance. We validate our intervals on a series of synthetic and real data examples, including non-linear settings, showing that our approach detects the correct important features and exhibits many computational and statistical advantages over existing methods.
MLAug 2, 2023
Interpretable Machine Learning for Discovery: Statistical Challenges \& OpportunitiesGenevera I. Allen, Luqin Gan, Lili Zheng
New technologies have led to vast troves of large and complex datasets across many scientific domains and industries. People routinely use machine learning techniques to not only process, visualize, and make predictions from this big data, but also to make data-driven discoveries. These discoveries are often made using Interpretable Machine Learning, or machine learning models and techniques that yield human understandable insights. In this paper, we discuss and review the field of interpretable machine learning, focusing especially on the techniques as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using Interpretable Machine Learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation from both a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven-discoveries.
MLMay 21, 2025Code
Are machine learning interpretations reliable? A stability study on global interpretationsLuqin Gan, Tarek M. Zikry, Genevera I. Allen
As machine learning systems are increasingly used in high-stakes domains, there is a growing emphasis placed on making them interpretable to improve trust in these systems. In response, a range of interpretable machine learning (IML) methods have been developed to generate human-understandable insights into otherwise black box models. With these methods, a fundamental question arises: Are these interpretations reliable? Unlike with prediction accuracy or other evaluation metrics for supervised models, the proximity to the true interpretation is difficult to define. Instead, we ask a closely related question that we argue is a prerequisite for reliability: Are these interpretations stable? We define stability as findings that are consistent or reliable under small random perturbations to the data or algorithms. In this study, we conduct the first systematic, large-scale empirical stability study on popular machine learning global interpretations for both supervised and unsupervised tasks on tabular data. Our findings reveal that popular interpretation methods are frequently unstable, notably less stable than the predictions themselves, and that there is no association between the accuracy of machine learning predictions and the stability of their associated interpretations. Moreover, we show that no single method consistently provides the most stable interpretations across a range of benchmark datasets. Overall, these results suggest that interpretability alone does not warrant trust, and underscores the need for rigorous evaluation of interpretation stability in future work. To support these principles, we have developed and released an open source IML dashboard and Python package to enable researchers to assess the stability and reliability of their own data-driven interpretations and discoveries.
MLOct 5, 2021
Fast and Interpretable Consensus Clustering via Minipatch LearningLuqin Gan, Genevera I. Allen
Consensus clustering has been widely used in bioinformatics and other applications to improve the accuracy, stability and reliability of clustering results. This approach ensembles cluster co-occurrences from multiple clustering runs on subsampled observations. For application to large-scale bioinformatics data, such as to discover cell types from single-cell sequencing data, for example, consensus clustering has two significant drawbacks: (i) computational inefficiency due to repeatedly applying clustering algorithms, and (ii) lack of interpretability into the important features for differentiating clusters. In this paper, we address these two challenges by developing IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering. Our approach adopts three major innovations. We ensemble cluster co-occurrences from tiny subsets of both observations and features, termed minipatches, thus dramatically reducing computation time. Additionally, we develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings, as well as adaptive sampling schemes of features, which leads to interpretable solutions by quickly learning the most relevant features that differentiate clusters. We study our approach on synthetic data and a variety of real large-scale bioinformatics data sets; results show that our approach not only yields more accurate and interpretable cluster solutions, but it also substantially improves computational efficiency compared to standard consensus clustering approaches.