Proxy-Based Approximation of Shapley and Banzhaf Interactions

Santo M. A. R. Thies, Hubert Baniecki, R. Teal Witter, Eyke Hüllermeier, Maximilian Muschalik, Fabian Fumagalli

arXiv:2605.2273877.8

AI Analysis

This work provides a practical, accurate estimator for higher-order feature interactions in machine learning, addressing a key bottleneck in model interpretability.

ProxySHAP introduces a proxy-based approximation for Shapley and Banzhaf interactions that achieves state-of-the-art approximation quality, outperforming prior methods like ProxySPEX and KernelSHAP-IQ in both small- and large-budget regimes, including large-scale applications with thousands of features.

Shapley and Banzhaf interactions capture the complex dynamics inherent in modern machine learning applications. However, current estimators for these higher-order interactions trade off between speed and accuracy. To overcome this limitation, we introduce ProxySHAP. ProxySHAP reconciles the high sample efficiency of tree-based proxy models with a principled path to consistency via residual correction. On a theoretical level, we derive a polynomial-time generalization of interventional TreeSHAP to compute exact interaction indices for tree ensembles, successfully bypassing exponential tree-depth dependencies in prior methods. Furthermore, we formally analyze the residual adjustment strategy, characterizing the specific conditions under which Maximum Sample Reuse (MSR) corrects proxy bias without its variance scaling exponentially with interaction size. Extensive benchmarking demonstrates that ProxySHAP sets a new state-of-the-art standard for approximation quality, including in large-scale applications with thousands of features. By achieving the lowest error in both small- and large-budget regimes, ProxySHAP significantly outperforms the prior best estimators ProxySPEX and KernelSHAP-IQ, while also delivering superior performance on downstream explainability tasks.

View on arXiv PDF

Similar