HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling
This work addresses the challenge of capturing human perceptual variability in generative models for speech processing, representing an incremental improvement over basic GANs.
The paper tackled the problem of modeling human-acceptable distributions in speech perception, which are wider than real-data distributions, by proposing HumanGAN, a GAN with a human-based discriminator, and demonstrated its ability to represent such distributions in speech naturalness modeling.
We propose the HumanGAN, a generative adversarial network (GAN) incorporating human perception as a discriminator. A basic GAN trains a generator to represent a real-data distribution by fooling the discriminator that distinguishes real and generated data. Therefore, the basic GAN cannot represent the outside of a real-data distribution. In the case of speech perception, humans can recognize not only human voices but also processed (i.e., a non-existent human) voices as human voice. Such a human-acceptable distribution is typically wider than a real-data one and cannot be modeled by the basic GAN. To model the human-acceptable distribution, we formulate a backpropagation-based generator training algorithm by regarding human perception as a black-boxed discriminator. The training efficiently iterates generator training by using a computer and discrimination by crowdsourcing. We evaluate our HumanGAN in speech naturalness modeling and demonstrate that it can represent a human-acceptable distribution that is wider than a real-data distribution.