SDASJul 11, 2021

PocketVAE: A Two-step Model for Groove Generation and Control

arXiv:2107.05009v1
Originality Synthesis-oriented
AI Analysis

This work addresses the time-consuming process of creating drum tracks in DAWs for users unfamiliar with drums, offering a domain-specific incremental improvement.

The authors tackled the problem of generating realistic drum grooves for digital audio workstations by introducing PocketVAE, a two-step model that applies grooves to user templates, resulting in improved learning of the original data distribution through discrete latent representations and control elements.

Creating a good drum track to imitate a skilled performer in digital audio workstations (DAWs) can be a time-consuming process, especially for those unfamiliar with drums. In this work, we introduce PocketVAE, a groove generation system that applies grooves to users' rudimentary MIDI tracks, i.e, templates. Grooves can be either transferred from a reference track, generated randomly or with conditions, such as genres. Our system, consisting of different modules for each groove component, takes a two-step approach that is analogous to a music creation process. First, the note module updates the user template through addition and deletion of notes; Second, the velocity and microtiming modules add details to this generated note score. In order to model the drum notes, we apply a discrete latent representation method via Vector Quantized Variational Autoencoder (VQ-VAE), as drum notes have a discrete property, unlike velocity and microtiming values. We show that our two-step approach and the usage of a discrete encoding space improves the learning of the original data distribution. Additionally, we discuss the benefit of incorporating control elements - genre, velocity and microtiming patterns - into the model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes