DBLGDec 2, 2024

A Comprehensive Study of Shapley Value in Data Analytics

arXiv:2412.01460v84 citationsh-index: 31Has CodeProc VLDB Endow
Originality Synthesis-oriented
AI Analysis

It provides a systematic review and tools for data scientists using Shapley value, but is incremental as it synthesizes existing work without new breakthroughs.

This paper presents a comprehensive study of Shapley value applications in data analytics, identifying key challenges like computation efficiency and privacy, and introduces SVBench, an open-source framework for evaluation.

Over the recent years, Shapley value (SV), a solution concept from cooperative game theory, has found numerous applications in data analytics (DA). This paper presents the first comprehensive study of SV used throughout the DA workflow, clarifying the key variables in defining DA-applicable SV and the essential functionalities that SV can provide for data scientists. We condense four primary challenges of using SV in DA, namely computation efficiency, approximation error, privacy preservation, and interpretability, disentangle the resolution techniques from existing arts in this field, then analyze and discuss the techniques w.r.t. each challenge and the potential conflicts between challenges.We also implement SVBench, a modular and extensible open-source framework for developing SV applications in different DA tasks, and conduct extensive evaluations to validate our analyses and discussions. Based on the qualitative and quantitative results, we identify the limitations of current efforts for applying SV to DA and highlight the directions of future research and engineering.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes