LIPEx-Locally Interpretable Probabilistic Explanations-To Look Beyond The True Class
This work addresses the need for efficient and insightful explanations in Explainable AI, particularly for multi-class scenarios, though it appears incremental as it builds on existing perturbation-based methods.
The authors tackled the problem of generating interpretable explanations for multi-class classification models by introducing LIPEx, a perturbation-based framework that replicates model probability distributions and shows how features affect predictions for all classes. They demonstrated that LIPEx causes more prediction changes in ablation tests and is about 53% faster than LIME in text classification experiments.
In this work, we instantiate a novel perturbation-based multi-class explanation framework, LIPEx (Locally Interpretable Probabilistic Explanation). We demonstrate that LIPEx not only locally replicates the probability distributions output by the widely used complex classification models but also provides insight into how every feature deemed to be important affects the prediction probability for each of the possible classes. We achieve this by defining the explanation as a matrix obtained via regression with respect to the Hellinger distance in the space of probability distributions. Ablation tests on text and image data, show that LIPEx-guided removal of important features from the data causes more change in predictions for the underlying model than similar tests based on other saliency-based or feature importance-based Explainable AI (XAI) methods. It is also shown that compared to LIME, LIPEx is more data efficient in terms of using a lesser number of perturbations of the data to obtain a reliable explanation. This data-efficiency is seen to manifest as LIPEx being able to compute its explanation matrix around 53% faster than all-class LIME, for classification experiments with text data.