LGAIApr 9, 2023

Distributed Conditional GAN (discGAN) For Synthetic Healthcare Data Generation

arXiv:2304.04290v13 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited healthcare data for researchers, but it is incremental as it applies an existing GAN framework to a new domain.

The paper tackled generating synthetic tabular healthcare data by proposing discGAN, a distributed GAN, and generated 249,000 synthetic records from 2,027 real eICU records, showing similar distributions to real data through statistical tests.

In this paper, we propose a distributed Generative Adversarial Networks (discGANs) to generate synthetic tabular data specific to the healthcare domain. While using GANs to generate images has been well studied, little to no attention has been given to generation of tabular data. Modeling distributions of discrete and continuous tabular data is a non-trivial task with high utility. We applied discGAN to model non-Gaussian multi-modal healthcare data. We generated 249,000 synthetic records from original 2,027 eICU dataset. We evaluated the performance of the model using machine learning efficacy, the Kolmogorov-Smirnov (KS) test for continuous variables and chi-squared test for discrete variables. Our results show that discGAN was able to generate data with distributions similar to the real data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes