LGAIApr 11, 2025

Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations

arXiv:2504.08919v11 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses a fundamental issue in interpretable AI for researchers and practitioners, though it is incremental as it builds on existing explanation methods.

The paper tackles the problem of post-hoc explanation methods potentially reversing the true input-output relationship, proposing Inversion Quantification (IQ) to measure this and showing that methods like LIME and SHAP are prone to it, especially with spurious correlations. They introduce Reproduce-by-Poking (RBP) as an enhancement that reduces inversion by 1.8% on average in synthetic data.

Post-hoc explanation methods provide interpretation by attributing predictions to input features. Natural explanations are expected to interpret how the inputs lead to the predictions. Thus, a fundamental question arises: Do these explanations unintentionally reverse the natural relationship between inputs and outputs? Specifically, are the explanations rationalizing predictions from the output rather than reflecting the true decision process? To investigate such explanatory inversion, we propose Inversion Quantification (IQ), a framework that quantifies the degree to which explanations rely on outputs and deviate from faithful input-output relationships. Using the framework, we demonstrate on synthetic datasets that widely used methods such as LIME and SHAP are prone to such inversion, particularly in the presence of spurious correlations, across tabular, image, and text domains. Finally, we propose Reproduce-by-Poking (RBP), a simple and model-agnostic enhancement to post-hoc explanation methods that integrates forward perturbation checks. We further show that under the IQ framework, RBP theoretically guarantees the mitigation of explanatory inversion. Empirically, for example, on the synthesized data, RBP can reduce the inversion by 1.8% on average across iconic post-hoc explanation approaches and domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes