SDLGFeb 12, 2018

Adversarial Audio Synthesis

arXiv:1802.04208v3701 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of audio synthesis for sound effect generation and other audio domains, representing an incremental step by adapting GANs from image to audio generation.

The paper tackles the problem of generating raw-waveform audio using generative adversarial networks (GANs), which had seen little prior application to audio synthesis, and demonstrates that WaveGAN can produce one-second audio slices with global coherence, including intelligible words from a small-vocabulary speech dataset and sounds from domains like drums and bird vocalizations.

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales. Generative adversarial networks (GANs) have seen wide success at generating images that are both locally and globally coherent, but they have seen little application to audio generation. In this paper we introduce WaveGAN, a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio. WaveGAN is capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation. Our experiments demonstrate that, without labels, WaveGAN learns to produce intelligible words when trained on a small-vocabulary speech dataset, and can also synthesize audio from other domains such as drums, bird vocalizations, and piano. We compare WaveGAN to a method which applies GANs designed for image generation on image-like audio feature representations, finding both approaches to be promising.

Code Implementations22 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes