Statistical Significance of Feature Importance Rankings
This work tackles the problem of unreliable feature attribution for practitioners in interpretable machine learning, offering a statistically rigorous solution.
The paper addresses the instability of feature importance rankings in machine learning by introducing hypothesis testing techniques to verify the stability of top-ranked features and efficient sampling algorithms that guarantee correct identification of the most important features with high probability, validated on SHAP and LIME.
Feature importance scores are ubiquitous tools for understanding the predictions of machine learning models. However, many popular attribution methods suffer from high instability due to random sampling. Leveraging novel ideas from hypothesis testing, we devise techniques that ensure the most important features are correct with high-probability guarantees. These assess the set of $K$ top-ranked features, as well as the order of its elements. Given a set of local or global importance scores, we demonstrate how to retrospectively verify the stability of the highest ranks. We then introduce two efficient sampling algorithms that identify the $K$ most important features, perhaps in order, with probability exceeding $1-α$. The theoretical justification for these procedures is validated empirically on SHAP and LIME.