Differentially Private Synthetic Medical Data Generation using Convolutional GANs
This work provides a method for researchers to generate privacy-preserving synthetic medical data, enabling the development of deep learning models on sensitive health records.
This paper addresses the privacy challenges in using health record data for deep learning by generating realistic synthetic data. The authors developed a differentially private framework using convolutional autoencoders and GANs, outperforming existing state-of-the-art models under the same privacy budget on several benchmark medical datasets.
Deep learning models have demonstrated superior performance in several application problems, such as image classification and speech processing. However, creating a deep learning model using health record data requires addressing certain privacy challenges that bring unique concerns to researchers working in this domain. One effective way to handle such private data issues is to generate realistic synthetic data that can provide practically acceptable data quality and correspondingly the model performance. To tackle this challenge, we develop a differentially private framework for synthetic data generation using Rényi differential privacy. Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data. In addition, our model can also capture the temporal information and feature correlations that might be present in the original data. We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget using several publicly available benchmark medical datasets in both supervised and unsupervised settings.