MLLGOct 30, 2022

Variance reduced Shapley value estimation for trustworthy data valuation

arXiv:2210.16835v530 citationsh-index: 30
Originality Incremental advance
AI Analysis

This work addresses the need for reliable data valuation in data marketplaces, offering an incremental improvement over existing methods.

The paper tackled the problem of high variance in Shapley value estimation for data valuation by proposing a variance reduced method using stratified sampling, achieving more robust and efficient data valuation across various datasets and applications.

Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace, we propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short). We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS. Finally, the effectiveness of VRDS is illustrated in different types of datasets and data removal applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes