SDLGASOct 14, 2021

SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs

arXiv:2110.07311v2
Originality Incremental advance
AI Analysis

This addresses the need for efficient sound effect synthesis in audio production, offering an incremental improvement over existing methods by leveraging single-image GANs for audio.

The paper tackled the problem of generating varied sound effects from a single example, introducing SpecSinGAN which uses multi-channel spectrograms to produce novel variations. In a listening study, it was found to be more plausible and varied than procedural audio models.

Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it, removing the need of a large dataset. In this paper we introduce SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect (e.g., a footstep; a character jump) and produces novel variations of it, as if they were different takes from the same recording session. We explore the use of multi-channel spectrograms to train the model on the various layers that comprise a single sound effect. A listening study comparing our model to real recordings and to digital signal processing procedural audio models in terms of sound plausibility and variation revealed that SpecSinGAN is more plausible and varied than the procedural audio models considered, when using multi-channel spectrograms. Sound examples can be found at the project website: https://www.adrianbarahonarios.com/specsingan/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes