CL CV SD ASAug 13, 2018

Towards Audio to Scene Image Synthesis using Generative Adversarial Network

Chia-Hung Wan, Shun-Po Chuang, Hung-Yi Lee

arXiv:1808.04108v14.370 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of audio-to-image synthesis for applications in multimedia and AI, but it is incremental as it builds on existing GAN techniques.

The paper tackles the problem of generating scene images from audio inputs using conditional generative adversarial networks (GANs), achieving improved image quality over naive conditional GANs with about 75% of people agreeing the generated images relate to sounds.

Humans can imagine a scene from a sound. We want machines to do so by using conditional generative adversarial networks (GANs). By applying the techniques including spectral norm, projection discriminator and auxiliary classifier, compared with naive conditional GAN, the model can generate images with better quality in terms of both subjective and objective evaluations. Almost three-fourth of people agree that our model have the ability to generate images related to sounds. By inputting different volumes of the same sound, our model output different scales of changes based on the volumes, showing that our model truly knows the relationship between sounds and images to some extent.

View on arXiv PDF

Similar