AINov 3, 2025

llmSHAP: A Principled Approach to LLM Explainability

Filip Naudot, Tobias Sundqvist, Timotheus Kampik

arXiv:2511.01311v15.81 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

This work addresses explainability for LLM-based decision support, but it is incremental as it adapts existing Shapley methods to stochastic contexts.

The paper tackles the challenge of applying Shapley value-based feature attribution to stochastic large language models (LLMs) in decision support systems, analyzing when principles can be guaranteed and highlighting trade-offs in speed, accuracy, and principle attainment.

Feature attribution methods help make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output. A particularly popular attribution method is based on the Shapley value from cooperative game theory, a measure that guarantees the satisfaction of several desirable principles, assuming deterministic inference. We apply the Shapley value to feature attribution in large language model (LLM)-based decision support systems, where inference is, by design, stochastic (non-deterministic). We then demonstrate when we can and cannot guarantee Shapley value principle satisfaction across different implementation variants applied to LLM-based decision support, and analyze how the stochastic nature of LLMs affects these guarantees. We also highlight trade-offs between explainable inference speed, agreement with exact Shapley value attributions, and principle attainment.

View on arXiv PDF

Similar