Private data sharing between decentralized users through the privGAN architecture
This addresses privacy concerns for enterprises that cannot share data due to competitive or client protection reasons, offering a practical solution for decentralized users, though it builds incrementally on existing GAN and federated learning concepts.
The paper tackles the problem of private data sharing by proposing a method based on the privGAN architecture, where synthetic data is generated without sharing real data or model parameters, resulting in better utility for owners compared to using small real datasets and demonstrating privacy with white-box attacks yielding results close to random guessing.
More data is almost always beneficial for analysis and machine learning tasks. In many realistic situations however, an enterprise cannot share its data, either to keep a competitive advantage or to protect the privacy of the data sources, the enterprise's clients for example. We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data, nor the parameters of models that have direct access to the data. The method proposed is based on the privGAN architecture where local GANs are trained on their respective data subsets with an extra penalty from a central discriminator aiming to discriminate the origin of a given fake sample. We demonstrate that this approach, when applied to subsets of various sizes, leads to better utility for the owners than the utility from their real small datasets. The only shared pieces of information are the parameter updates of the central discriminator. The privacy is demonstrated with white-box attacks on the most vulnerable elments of the architecture and the results are close to random guessing. This method would apply naturally in a federated learning setting.