LGAICRDec 22, 2024

Data value estimation on private gradients

arXiv:2412.17008v11 citationsh-index: 39
Originality Highly original
AI Analysis

This addresses a bottleneck for privacy-aware applications like data pricing and federated learning, offering a novel solution to maintain data valuation accuracy under DP constraints.

The paper tackles the problem of data value estimation under differential privacy via gradient perturbations, showing that standard i.i.d. noise injection leads to poor estimates, and proposes using correlated noise to provably reduce estimation uncertainty, with empirical improvements on various ML tasks.

For gradient-based machine learning (ML) methods commonly adopted in practice such as stochastic gradient descent, the de facto differential privacy (DP) technique is perturbing the gradients with random Gaussian noise. Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP such as data pricing, collaborative ML, and federated learning (FL). Can existing data valuation methods still be used when DP is enforced via gradient perturbations? We show that the answer is no with the default approach of injecting i.i.d.~random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget, producing estimates almost like random guesses. To address this issue, we propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.~the budget. We also empirically demonstrate that our method gives better data value estimates on various ML tasks and is applicable to use cases including dataset valuation and~FL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes