SDLGASJun 11, 2021

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

arXiv:2106.06426v229 citations
Originality Incremental advance
AI Analysis

This enables audio generation and manipulation for domains like music and speech without extensive datasets, though it is incremental as it builds on existing GAN methods.

The paper tackles the problem of generating audio from minimal training data, demonstrating that a GAN-based model can be trained on as little as 20 seconds of a single audio example to produce new samples with semantic similarity, achieving state-of-the-art results across various applications like music generation and audio enhancement.

Models for audio generation are typically trained on hours of recordings. Here, we illustrate that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal. Specifically, we present a GAN-based generative model that can be trained on one short audio signal from any domain (e.g. speech, music, etc.) and does not require pre-training or any other form of external supervision. Once trained, our model can generate random samples of arbitrary duration that maintain semantic similarity to the training waveform, yet exhibit new compositions of its audio primitives. This enables a long line of interesting applications, including generating new jazz improvisations or new a-cappella rap variants based on a single short example, producing coherent modifications to famous songs (e.g. adding a new verse to a Beatles song based solely on the original recording), filling-in of missing parts (inpainting), extending the bandwidth of a speech signal (super-resolution), and enhancing old recordings without access to any clean training example. We show that in all cases, no more than 20 seconds of training audio commonly suffice for our model to achieve state-of-the-art results. This is despite its complete lack of prior knowledge about the nature of audio signals in general.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes