CR LGOct 5, 2023

FLAIM: AIM-based Synthetic Data Generation in the Federated Setting

Samuel Maddock, Graham Cormode, Carsten Maple

arXiv:2310.03447v310.510 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses privacy-preserving data sharing for organizations with distributed data, but it is incremental as it builds upon an existing centralized method.

The paper tackled the problem of generating synthetic tabular data in federated settings to preserve privacy, proposing FLAIM to address utility degradation and overhead issues, showing improvements in utility and reduced overhead across benchmark datasets.

Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We first show that it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show we can improve utility while reducing overhead.

View on arXiv PDF Code

Similar