An Odd Estimator for Shapley Values

Fabian Fumagalli, Landon Butler, Justin Singh Kang, Kannan Ramchandran, R. Teal Witter

arXiv:2602.01399v15.82 citations

Originality Highly original

AI Analysis

This work provides a theoretical justification for paired sampling in Shapley value estimation, offering a more efficient method for practitioners in ML attribution tasks, though it is incremental as it builds on existing approximation techniques.

The authors tackled the problem of efficiently approximating Shapley values, which are crucial for attribution in machine learning but computationally intractable, by proposing OddSHAP, a novel estimator that leverages the odd component of set functions and achieves state-of-the-art estimation accuracy in benchmarks.

The Shapley value is a ubiquitous framework for attribution in machine learning, encompassing feature importance, data valuation, and causal inference. However, its exact computation is generally intractable, necessitating efficient approximation methods. While the most effective and popular estimators leverage the paired sampling heuristic to reduce estimation error, the theoretical mechanism driving this improvement has remained opaque. In this work, we provide an elegant and fundamental justification for paired sampling: we prove that the Shapley value depends exclusively on the odd component of the set function, and that paired sampling orthogonalizes the regression objective to filter out the irrelevant even component. Leveraging this insight, we propose OddSHAP, a novel consistent estimator that performs polynomial regression solely on the odd subspace. By utilizing the Fourier basis to isolate this subspace and employing a proxy model to identify high-impact interactions, OddSHAP overcomes the combinatorial explosion of higher-order approximations. Through an extensive benchmark evaluation, we find that OddSHAP achieves state-of-the-art estimation accuracy.

View on arXiv PDF

Similar