CRLGOct 30, 2023

Scalable and Privacy-Preserving Synthetic Data Generation on Decentralised Web

arXiv:2310.20062v22 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the challenge of scalable, privacy-preserving synthetic data generation for AI development on the decentralized web, though it is an incremental improvement to an existing system.

The paper tackles the scalability limitations of the Libertas system for decentralized synthetic data generation by integrating secure enclaves (Intel SGX) with MPC, achieving up to 10x faster computation and 5x lower communication overhead in experiments.

Data on the Web has fueled much of the recent progress in AI. As more high-quality data becomes difficult to access, synthetic data is emerging as a promising solution for privacy-friendly data release and complementing real datasets in developing robust and safe AI. But there is limited work on decentralised, scalable and contributor-centric synthetic data generation systems. A recent proposal, called Libertas, allows data contributors to autonomously participate in joint computations over their Web data without relying on a trusted centre. Libertas uses Solid (Social Linked Data) and MPC (Secure Multi-Party Computation) to achieve this goal. Solid is a decentralised Web specification that lets anyone store their data securely in their personal decentralised data stores called Pods and control which applications have access to their data. MPC refers to the set of cryptographic methods for different parties to jointly compute a function over their inputs while keeping those inputs private. Thus, Libertas can also be used to generate synthetic data from otherwise inaccessible Web data in a responsible way, by ensuring contributor autonomy, decentralisation and privacy. However, the scalability of this system remains limited due to the high computation and communication costs in MPC. In this paper, we show how one can improve Libertas using secure enclaves (in addition to MPC) to address the scalability challenge. Secure enclaves such as Intel SGX rely on hardware based features for confidentiality and integrity of code and data. We discuss a principled approach for integrating SGX within the Libertas architecture for scalable differentially private synthetic data generation, and support our analysis with rigorous empirical results on simulated and real datasets and different synthetic data generation algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes