CRDCLGMar 27, 2023

PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences

arXiv:2303.18200v21 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This addresses data privacy and ownership challenges for social scientists, but it is incremental as it applies existing federated learning concepts to a specific domain.

The authors tackled the problem of data privacy and ownership in social data science by developing PADME, a distributed analytics tool that uses federated learning to train models across multiple data locations without centralizing data, preserving data ownership and privacy while enabling analysis as if all data were in one place.

Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes