SDAIASJul 10, 2023

VampNet: Music Generation via Masked Acoustic Token Modeling

arXiv:2307.04686v294 citationsh-index: 34
AI Analysis

This work addresses music creation and editing for users in audio processing, offering a flexible co-creation tool with incremental improvements in non-autoregressive modeling.

The paper tackles music generation and manipulation by introducing VampNet, a masked acoustic token modeling approach that can generate coherent high-fidelity musical waveforms with just 36 sampling passes, enabling tasks like synthesis, compression, and variation.

We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes