Succinct Interaction-Aware Explanations
This work provides a solution for users needing clearer model explanations in machine learning applications, though it is incremental as it builds upon existing SHAP and NSHAP methods.
The paper tackles the problem of generating interpretable explanations for black-box models by addressing the limitations of SHAP and NSHAP, which either ignore feature interactions or produce exponentially large outputs, resulting in a method that partitions features into interacting sets to create succinct and accurate explanations.
SHAP is a popular approach to explain black-box models by revealing the importance of individual features. As it ignores feature interactions, SHAP explanations can be confusing up to misleading. NSHAP, on the other hand, reports the additive importance for all subsets of features. While this does include all interacting sets of features, it also leads to an exponentially sized, difficult to interpret explanation. In this paper, we propose to combine the best of these two worlds, by partitioning the features into parts that significantly interact, and use these parts to compose a succinct, interpretable, additive explanation. We derive a criterion by which to measure the representativeness of such a partition for a models behavior, traded off against the complexity of the resulting explanation. To efficiently find the best partition out of super-exponentially many, we show how to prune sub-optimal solutions using a statistical test, which not only improves runtime but also helps to detect spurious interactions. Experiments on synthetic and real world data show that our explanations are both more accurate resp. more easily interpretable than those of SHAP and NSHAP.