Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records
This work addresses data scarcity in healthcare prediction models, but it is incremental as it combines existing methods (GANs and CNNs) for a specific domain.
The authors tackled the problem of limited labeled data for deep learning risk prediction in Electronic Health Records by proposing a framework that uses a generative adversarial network (ehrGAN) to augment training data, achieving significant improvements in classification tasks over state-of-the-art baselines.
The rapid growth of Electronic Health Records (EHRs), as well as the accompanied opportunities in Data-Driven Healthcare (DDH), has been attracting widespread interests and attentions. Recent progress in the design and applications of deep learning methods has shown promising results and is forcing massive changes in healthcare academia and industry, but most of these methods rely on massive labeled data. In this work, we propose a general deep learning framework which is able to boost risk prediction performance with limited EHR data. Our model takes a modified generative adversarial network namely ehrGAN, which can provide plausible labeled EHR data by mimicking real patient records, to augment the training dataset in a semi-supervised learning manner. We use this generative model together with a convolutional neural network (CNN) based prediction model to improve the onset prediction performance. Experiments on two real healthcare datasets demonstrate that our proposed framework produces realistic data samples and achieves significant improvements on classification tasks with the generated data over several stat-of-the-art baselines.