LGMLFeb 27, 2019

Towards Efficient Data Valuation Based on the Shapley Value

arXiv:1902.10275v3551 citations
Originality Incremental advance
AI Analysis

This work addresses the need for fair data valuation in scenarios like profit distribution and breach compensation, though it is incremental in improving computational efficiency.

The paper tackles the problem of efficiently computing data valuation using the Shapley value, which is computationally expensive, by proposing efficient approximation algorithms and demonstrating their application on benchmark datasets.

"How much is my data worth?" is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes