AS LG SDNov 26, 2021

Learning source-aware representations of music in a discrete latent space

Jinsung Kim, Yeong-Seok Jeong, Woosung Choi, Jaehwa Chung, Soonyoung Jung

arXiv:2111.13321v11.2

Originality Incremental advance

AI Analysis

This addresses the issue for music producers and researchers who need analyzable and editable music representations, though it appears incremental as it builds on existing VQ-VAE techniques.

The paper tackles the problem of generating human-readable and editable music representations by proposing a method to learn source-aware latent representations using a Vector-Quantized Variational Auto-Encoder (VQ-VAE), resulting in a decomposed structure that allows manipulation of latent vectors for tasks like generating bass lines.

In recent years, neural network based methods have been proposed as a method that cangenerate representations from music, but they are not human readable and hardly analyzable oreditable by a human. To address this issue, we propose a novel method to learn source-awarelatent representations of music through Vector-Quantized Variational Auto-Encoder(VQ-VAE).We train our VQ-VAE to encode an input mixture into a tensor of integers in a discrete latentspace, and design them to have a decomposed structure which allows humans to manipulatethe latent vector in a source-aware manner. This paper also shows that we can generate basslines by estimating latent vectors in a discrete space.

View on arXiv PDF

Similar