Synthesizing Tabular Data using Generative Adversarial Networks
This addresses the need for realistic synthetic data generation in domains like healthcare and education, but it is incremental as it adapts existing GAN methods to tabular data.
The paper tackles the problem of generating synthetic tabular data, such as medical or educational records, by introducing Tabular GAN (TGAN), a generative adversarial network that produces high-quality synthetic tables with both discrete and continuous variables. The result shows that TGAN outperforms conventional statistical generative models in capturing column correlations and scaling for large datasets across three evaluated datasets.
Generative adversarial networks (GANs) implicitly learn the probability distribution of a dataset and can draw samples from the distribution. This paper presents, Tabular GAN (TGAN), a generative adversarial network which can generate tabular data like medical or educational records. Using the power of deep neural networks, TGAN generates high-quality and fully synthetic tables while simultaneously generating discrete and continuous variables. When we evaluate our model on three datasets, we find that TGAN outperforms conventional statistical generative models in both capturing the correlation between columns and scaling up for large datasets.