SDLGASMLMar 30, 2021

Symbolic Music Generation with Diffusion Models

arXiv:2103.16091v2230 citations
AI Analysis

This addresses the problem of efficient and high-quality music generation for AI and creative applications, representing an incremental advance in applying diffusion models to discrete domains.

The paper tackles generating symbolic music by adapting diffusion models to discrete sequential data using a pre-trained VAE's latent space, achieving strong unconditional generation and conditional infilling results compared to autoregressive models.

Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio. However, due to their Langevin-inspired sampling mechanisms, their application to discrete and sequential data has been limited. In this work, we present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder. Our method is non-autoregressive and learns to generate sequences of latent embeddings through the reverse process and offers parallel generation with a constant number of iterative refinement steps. We apply this technique to modeling symbolic music and show strong unconditional generation and post-hoc conditional infilling results compared to autoregressive language models operating over the same continuous embeddings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes