RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks
This addresses data scarcity in predictive modeling, particularly for medical applications, though it appears incremental as it builds on existing GAN-based data translation approaches.
The paper tackles the problem of training predictive models when target data is scarce by proposing RadialGAN, which uses multiple GAN architectures to translate data from external related datasets to enlarge the target dataset. The result is improved prediction performance on the target domain compared to using only the target dataset, with the method outperforming benchmarks on real-world medical datasets.
Training complex machine learning models for prediction often requires a large amount of data that is not always readily available. Leveraging these external datasets from related but different sources is therefore an important task if good predictive models are to be built for deployment in settings where data can be rare. In this paper we propose a novel approach to the problem in which we use multiple GAN architectures to learn to translate from one dataset to another, thereby allowing us to effectively enlarge the target dataset, and therefore learn better predictive models than if we simply used the target dataset. We show the utility of such an approach, demonstrating that our method improves the prediction performance on the target domain over using just the target dataset and also show that our framework outperforms several other benchmarks on a collection of real-world medical datasets.