Faithful Group Shapley Value
This addresses a security and fairness problem in data valuation for machine learning practitioners, offering a robust solution against manipulation, though it is incremental in extending Data Shapley to group-level contexts.
The paper tackles the vulnerability of existing group-level data valuation methods to strategic manipulation, such as shell company attacks, and proposes Faithful Group Shapley Value (FGSV) to defend against these attacks, with empirical results showing significant improvements in computational efficiency and approximation accuracy over state-of-the-art methods.
Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.