Ensembles of GANs for synthetic training data generation
This addresses data scarcity issues for researchers and practitioners in fields like medical imaging, but it is incremental as it builds on existing GAN methods.
The paper tackles the problem of insufficient training data in deep learning, particularly in medical imaging, by using ensembles of GANs to generate synthetic data, showing that this approach improves diversity and reduces over-fitting compared to single GANs, with tests on synthetic 2D data and datasets like SVHN and CIFAR-10.
Insufficient training data is a major bottleneck for most deep learning practices, not least in medical imaging where data is difficult to collect and publicly available datasets are scarce due to ethics and privacy. This work investigates the use of synthetic images, created by generative adversarial networks (GANs), as the only source of training data. We demonstrate that for this application, it is of great importance to make use of multiple GANs to improve the diversity of the generated data, i.e. to sufficiently cover the data distribution. While a single GAN can generate seemingly diverse image content, training on this data in most cases lead to severe over-fitting. We test the impact of ensembled GANs on synthetic 2D data as well as common image datasets (SVHN and CIFAR-10), and using both DCGANs and progressively growing GANs. As a specific use case, we focus on synthesizing digital pathology patches to provide anonymized training data.