LGAICRMay 4, 2023

Incentivising the federation: gradient-based metrics for data selection and valuation in private decentralised training

arXiv:2305.02942v33 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of incentivizing data owners in federated learning by providing tools for data valuation, though it is incremental as it builds on existing gradient-based techniques.

The paper tackles the challenge of selecting beneficial data for collaborative model training in private federated settings, where differential privacy noise can obscure the value of underrepresented samples, and demonstrates that gradient-based methods like VoG and PLIS enable principled data selection even under strict privacy constraints.

Obtaining high-quality data for collaborative training of machine learning models can be a challenging task due to A) regulatory concerns and B) a lack of data owner incentives to participate. The first issue can be addressed through the combination of distributed machine learning techniques (e.g. federated learning) and privacy enhancing technologies (PET), such as the differentially private (DP) model training. The second challenge can be addressed by rewarding the participants for giving access to data which is beneficial to the training model, which is of particular importance in federated settings, where the data is unevenly distributed. However, DP noise can adversely affect the underrepresented and the atypical (yet often informative) data samples, making it difficult to assess their usefulness. In this work, we investigate how to leverage gradient information to permit the participants of private training settings to select the data most beneficial for the jointly trained model. We assess two such methods, namely variance of gradients (VoG) and the privacy loss-input susceptibility score (PLIS). We show that these techniques can provide the federated clients with tools for principled data selection even in stricter privacy settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes