Imputation Uncertainty in Interpretable Machine Learning Methods
This addresses the issue of imputation uncertainty for researchers and practitioners using IML methods, but it is incremental as it builds on prior work on bias by focusing on variance and confidence intervals.
The paper tackles the problem of missing values affecting interpretable machine learning (IML) methods by comparing how different imputation methods influence confidence interval coverage probabilities for permutation feature importance, partial dependence plots, and Shapley values, showing that single imputation underestimates variance and multiple imputation achieves near-nominal coverage in most cases.
In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.