Learning source-aware representations of music in a discrete latent space
This addresses the issue for music producers and researchers who need analyzable and editable music representations, though it appears incremental as it builds on existing VQ-VAE techniques.
The paper tackles the problem of generating human-readable and editable music representations by proposing a method to learn source-aware latent representations using a Vector-Quantized Variational Auto-Encoder (VQ-VAE), resulting in a decomposed structure that allows manipulation of latent vectors for tasks like generating bass lines.
In recent years, neural network based methods have been proposed as a method that cangenerate representations from music, but they are not human readable and hardly analyzable oreditable by a human. To address this issue, we propose a novel method to learn source-awarelatent representations of music through Vector-Quantized Variational Auto-Encoder(VQ-VAE).We train our VQ-VAE to encode an input mixture into a tensor of integers in a discrete latentspace, and design them to have a decomposed structure which allows humans to manipulatethe latent vector in a source-aware manner. This paper also shows that we can generate basslines by estimating latent vectors in a discrete space.