Sampling Strategies for GAN Synthetic Data
This work addresses the challenge of effectively using synthetic data for data augmentation in deep learning, particularly for face attribute classification, but it is incremental as it builds on existing GAN and augmentation methods.
The paper tackles the problem of GAN-generated synthetic data lacking realism and diversity for training deep CNNs by proposing sampling strategies, such as using discriminator confidence scores and reinforcement learning, to select meaningful subsets, resulting in improved performance on face attribute classification datasets like AffectNet and CelebA.
Generative Adversarial Networks (GANs) have been used widely to generate large volumes of synthetic data. This data is being utilized for augmenting with real examples in order to train deep Convolutional Neural Networks (CNNs). Studies have shown that the generated examples lack sufficient realism to train deep CNNs and are poor in diversity. Unlike previous studies of randomly augmenting the synthetic data with real data, we present our simple, effective and easy to implement synthetic data sampling methods to train deep CNNs more efficiently and accurately. To this end, we propose to maximally utilize the parameters learned during training of the GAN itself. These include discriminator's realism confidence score and the confidence on the target label of the synthetic data. In addition to this, we explore reinforcement learning (RL) to automatically search a subset of meaningful synthetic examples from a large pool of GAN synthetic data. We evaluate our method on two challenging face attribute classification data sets viz. AffectNet and CelebA. Our extensive experiments clearly demonstrate the need of sampling synthetic data before augmentation, which also improves the performance of one of the state-of-the-art deep CNNs in vitro.