DS LG MLJan 20, 2024

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Jiachen T. Wang, Prateek Mittal, Ruoxi Jia

arXiv:2401.11103v111.316 citationsAISTATS

Originality Highly original

AI Analysis

This addresses a computational bottleneck in data valuation for machine learning practitioners, offering a more efficient method for assessing data contributions in weighted nearest neighbor models.

The paper tackled the efficient computation of Data Shapley for weighted K nearest neighbor algorithms by reframing it as a counting problem and introducing a quadratic-time algorithm, improving from O(N^K) to O(N^2) and demonstrating superior performance in discerning data quality.

This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.

View on arXiv PDF

Similar