TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models
This work incrementally improves the theoretical grounding of explanations for opaque models, which is important for trustworthy AI deployment in domains requiring interpretability.
The paper tackles the problem of quantifying individual feature contributions in post-hoc model-agnostic explanations for opaque models by proposing TaylorPODA, a method based on Taylor expansion with new postulates and an adaptation property. Empirical results show it achieves competitive performance against baseline methods while providing principled explanations.
Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for quantifying the contribution of individual features. Building on the Taylor expansion framework introduced by Deng et al. (2024) to unify existing local attribution methods, we propose a rigorous set of postulates -- "precision", "federation", and "zero-discrepancy" -- to govern Taylor term-specific attribution. Guided by these postulates, we introduce TaylorPODA (Taylor expansion-derived imPortance-Order aDapted Attribution), which incorporates an additional "adaptation" property. This property enables alignment with task-specific goals, especially in post-hoc settings lacking ground-truth explanations. Empirical evaluations demonstrate that TaylorPODA achieves competitive results against baseline methods, providing principled and visualization-friendly explanations. This work enhances the trustworthy deployment of opaque models by offering explanations with stronger theoretical grounding.