Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data
For practitioners using explainability methods on tabular data, this study highlights that explanations may be unreliable and depend more on data characteristics than model accuracy.
This paper evaluates the trustworthiness of local explainability techniques (LIME, Kernel SHAP, Feature Ablation) on tabular classification tasks across 32 datasets, finding that explanation quality is primarily influenced by dataset complexity and feature distributions rather than model performance.
Despite the wide use of explainability techniques to attempt to understand the behavior of Artificial Intelligence (AI), the generated explanations may not always be reliable. An explanation can appear plausible to humans but fail to capture the internal reasoning of a model, particularly when dealing with complex tabular data. This paper studies the trustworthiness of local explainability techniques when applied to complex tabular classification tasks, considering evaluated metrics for three main properties: faithfulness to the model's predictions, robustness to input data variations, and complexity of the explanation itself. A benchmark was performed for Local Interpretable Model-Agnostic Explanations (LIME), Kernel SHapley Additive exPlanations (SHAP), and Feature Ablation techniques, across 32 datasets and different types of machine learning models. Model performance ranges were analyzed to identify two groups: consensus-correct, which are samples that all models predicted correctly, and consensus-wrong, samples that all models predicted incorrectly. The obtained results demonstrate that that the explanations are not always correlated with a model's predictive performance. Instead, dataset complexity and feature distributions seem to be the main factors affecting explanation quality and reliability.