A Semi-Supervised Generative Adversarial Network for Prediction of Genetic Disease Outcomes
This work addresses the problem of expensive and time-consuming labeled data collection for genetic disease prediction, offering a practical solution for researchers and clinicians, though it appears incremental as it builds on existing GAN architectures.
The paper tackles the challenge of predicting genetic disease outcomes with limited labeled data by introducing a semi-supervised generative adversarial network (gGAN) that generates synthetic genetic datasets from small labeled and large unlabeled data, achieving satisfactory results across diverse populations and datasets.
For most diseases, building large databases of labeled genetic data is an expensive and time-demanding task. To address this, we introduce genetic Generative Adversarial Networks (gGAN), a semi-supervised approach based on an innovative GAN architecture to create large synthetic genetic data sets starting with a small amount of labeled data and a large amount of unlabeled data. Our goal is to determine the propensity of a new individual to develop the severe form of the illness from their genetic profile alone. The proposed model achieved satisfactory results using real genetic data from different datasets and populations, in which the test populations may not have the same genetic profiles. The proposed model is self-aware and capable of determining whether a new genetic profile has enough compatibility with the data on which the network was trained and is thus suitable for prediction. The code and datasets used can be found at https://github.com/caio-davi/gGAN.