SDAIHCASApr 15, 2021

Spectrogram Inpainting for Interactive Generation of Instrument Sounds

arXiv:2104.07519v17 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for more controllable sound synthesis tools for musicians, though it is incremental as it adapts existing methods to a new domain.

The paper tackles the problem of controlling deep neural network sound synthesis for musicians by framing instrument note generation as an inpainting task, resulting in an interactive web interface for creative sound transformation.

Modern approaches to sound synthesis using deep neural networks are hard to control, especially when fine-grained conditioning information is not available, hindering their adoption by musicians. In this paper, we cast the generation of individual instrumental notes as an inpainting-based task, introducing novel and unique ways to iteratively shape sounds. To this end, we propose a two-step approach: first, we adapt the VQ-VAE-2 image generation architecture to spectrograms in order to convert real-valued spectrograms into compact discrete codemaps, we then implement token-masked Transformers for the inpainting-based generation of these codemaps. We apply the proposed architecture on the NSynth dataset on masked resampling tasks. Most crucially, we open-source an interactive web interface to transform sounds by inpainting, for artists and practitioners alike, opening up to new, creative uses.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes