"Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations
It addresses trust issues in interpretability methods for users relying on explanations, but is incremental as it focuses on analyzing existing uncertainty in LIME.
The paper identifies and demonstrates two sources of uncertainty in LIME explanations—randomness in sampling and variation across inputs—even in models with high accuracy, using synthetic and public datasets like 20 Newsgroup and COMPAS.
Methods for interpreting machine learning black-box models increase the outcomes' transparency and in turn generates insight into the reliability and fairness of the algorithms. However, the interpretations themselves could contain significant uncertainty that undermines the trust in the outcomes and raises concern about the model's reliability. Focusing on the method "Local Interpretable Model-agnostic Explanations" (LIME), we demonstrate the presence of two sources of uncertainty, namely the randomness in its sampling procedure and the variation of interpretation quality across different input data points. Such uncertainty is present even in models with high training and test accuracy. We apply LIME to synthetic data and two public data sets, text classification in 20 Newsgroup and recidivism risk-scoring in COMPAS, to support our argument.