DPD-fVAE: Synthetic Data Generation Using Federated Variational Autoencoders With Differentially-Private Decoder
This work addresses privacy-preserving synthetic data generation for domains like healthcare, but it is incremental as it builds on existing federated learning and differential privacy methods.
The paper tackled the problem of generating synthetic data from sensitive distributed datasets by proposing DPD-fVAE, a federated variational autoencoder with a differentially-private decoder, which achieved competitive performance on MNIST, Fashion-MNIST, and CelebA in terms of Fréchet Inception Distance and classifier accuracy.
Federated learning (FL) is getting increased attention for processing sensitive, distributed datasets common to domains such as healthcare. Instead of directly training classification models on these datasets, recent works have considered training data generators capable of synthesising a new dataset which is not protected by any privacy restrictions. Thus, the synthetic data can be made available to anyone, which enables further evaluation of machine learning architectures and research questions off-site. As an additional layer of privacy-preservation, differential privacy can be introduced into the training process. We propose DPD-fVAE, a federated Variational Autoencoder with Differentially-Private Decoder, to synthesise a new, labelled dataset for subsequent machine learning tasks. By synchronising only the decoder component with FL, we can reduce the privacy cost per epoch and thus enable better data generators. In our evaluation on MNIST, Fashion-MNIST and CelebA, we show the benefits of DPD-fVAE and report competitive performance to related work in terms of Fréchet Inception Distance and accuracy of classifiers trained on the synthesised dataset.