Selective Synthetic Augmentation with Quality Assurance
This addresses data scarcity and imbalance issues in medical imaging and other domains, offering a quality-assured method for synthetic data augmentation, though it is incremental as it builds on existing cGAN techniques.
The paper tackles the problem of limited and imbalanced expert-annotated data for training medical image analysis systems by introducing a selective synthetic augmentation pipeline that uses conditional GANs to generate synthetic images, then selects them based on label confidence and feature similarity to real images, resulting in classification accuracy improvements of 6.8%, 3.9%, and 1.6% on three datasets.
Supervised training of an automated medical image analysis system often requires a large amount of expert annotations that are hard to collect. Moreover, the proportions of data available across different classes may be highly imbalanced for rare diseases. To mitigate these issues, we investigate a novel data augmentation pipeline that selectively adds new synthetic images generated by conditional Adversarial Networks (cGANs), rather than extending directly the training set with synthetic images. The selection mechanisms that we introduce to the synthetic augmentation pipeline are motivated by the observation that, although cGAN-generated images can be visually appealing, they are not guaranteed to contain essential features for classification performance improvement. By selecting synthetic images based on the confidence of their assigned labels and their feature similarity to real labeled images, our framework provides quality assurance to synthetic augmentation by ensuring that adding the selected synthetic images to the training set will improve performance. We evaluate our model on a medical histopathology dataset, and two natural image classification benchmarks, CIFAR10 and SVHN. Results on these datasets show significant and consistent improvements in classification performance (with 6.8%, 3.9%, 1.6% higher accuracy, respectively) by leveraging cGAN generated images with selective augmentation.