LGAIMay 2, 2023

Data valuation: The partial ordinal Shapley value for machine learning

arXiv:2305.01660v11 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses a specific gap in data valuation for machine learning, but it is incremental as it builds on existing Shapley value methods with a focus on order and computational efficiency.

The paper tackles the challenge of incorporating order in data cooperation for Shapley value-based data valuation by defining a partial ordinal Shapley value using group theory and proposes three approximation algorithms to address its exponential computational complexity.

Data valuation using Shapley value has emerged as a prevalent research domain in machine learning applications. However, it is a challenge to address the role of order in data cooperation as most research lacks such discussion. To tackle this problem, this paper studies the definition of the partial ordinal Shapley value by group theory in abstract algebra. Besides, since the calculation of the partial ordinal Shapley value requires exponential time, this paper also gives three algorithms for approximating the results. The Truncated Monte Carlo algorithm is derived from the classic Shapley value approximation algorithm. The Classification Monte Carlo algorithm and the Classification Truncated Monte Carlo algorithm are based on the fact that the data points in the same class provide similar information, then we can accelerate the calculation by leaving out some data points in each class.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes