How Reliable and Stable are Explanations of XAI Methods?
This work addresses the critical issue of trustworthiness in XAI for users relying on black-box models, though it is incremental as it builds on existing methods with a new evaluation framework.
The research tackled the problem of evaluating the reliability and stability of Explainable AI (XAI) methods by testing them with perturbations on a diabetes dataset using four machine learning models, finding that eXirt identified the most reliable models and that most XAI methods were sensitive to perturbations except for one specific method.
Black box models are increasingly being used in the daily lives of human beings living in society. Along with this increase, there has been the emergence of Explainable Artificial Intelligence (XAI) methods aimed at generating additional explanations regarding how the model makes certain predictions. In this sense, methods such as Dalex, Eli5, eXirt, Lofo and Shap emerged as different proposals and methodologies for generating explanations of black box models in an agnostic way. Along with the emergence of these methods, questions arise such as "How Reliable and Stable are XAI Methods?". With the aim of shedding light on this main question, this research creates a pipeline that performs experiments using the diabetes dataset and four different machine learning models (LGBM, MLP, DT and KNN), creating different levels of perturbations of the test data and finally generates explanations from the eXirt method regarding the confidence of the models and also feature relevances ranks from all XAI methods mentioned, in order to measure their stability in the face of perturbations. As a result, it was found that eXirt was able to identify the most reliable models among all those used. It was also found that current XAI methods are sensitive to perturbations, with the exception of one specific method.