GTLGNov 1, 2024

Towards Data Valuation via Asymmetric Data Shapley

arXiv:2411.00388v31 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of data valuation for machine learning practitioners and data market participants, representing an incremental improvement by modifying an existing method to handle dataset structures.

The paper tackles the problem of accurately quantifying data value in algorithmic decision-making by extending the traditional data Shapley framework to asymmetric data Shapley, which incorporates dataset structures for structure-aware valuation, and demonstrates its applicability across various machine learning tasks and data markets.

As data emerges as a vital driver of technological and economic advancements, a key challenge is accurately quantifying its value in algorithmic decision-making. The Shapley value, a well-established concept from cooperative game theory, has been widely adopted to assess the contribution of individual data sources in supervised machine learning. However, its symmetry axiom assumes all players in the cooperative game are homogeneous, which overlooks the complex structures and dependencies present in real-world datasets. To address this limitation, we extend the traditional data Shapley framework to asymmetric data Shapley, making it flexible enough to incorporate inherent structures within the datasets for structure-aware data valuation. We also introduce an efficient $k$-nearest neighbor-based algorithm for its exact computation. We demonstrate the practical applicability of our framework across various machine learning tasks and data market contexts. The code is available at: https://github.com/xzheng01/Asymmetric-Data-Shapley.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes