A Bayesian explanation of machine learning models based on modes and functional ANOVA
This provides a more human-intuitive and robust explanation method for AI predictions, addressing a specific inverse problem in explainable AI.
The paper tackles the inverse explanation problem in XAI by developing a Bayesian method to identify influential features for label deviations from the mode, using functional ANOVA distances, and shows it is more intuitive and robust than SHAP with dimension-independent costs.
Most methods in explainable AI (XAI) focus on providing reasons for the prediction of a given set of features. However, we solve an inverse explanation problem, i.e., given the deviation of a label, find the reasons of this deviation. We use a Bayesian framework to recover the ``true'' features, conditioned on the observed label value. We efficiently explain the deviation of a label value from the mode, by identifying and ranking the influential features using the ``distances'' in the ANOVA functional decomposition. We show that the new method is more human-intuitive and robust than methods based on mean values, e.g., SHapley Additive exPlanations (SHAP values). The extra costs of solving a Bayesian inverse problem are dimension-independent.