ASLGSDApr 21, 2020

Vector Quantized Contrastive Predictive Coding for Template-based Music Generation

arXiv:2004.10120v121 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of unsupervised music generation for creative applications, though it is incremental as it builds on existing contrastive and Transformer methods.

The authors tackled the problem of generating variations of discrete sequences like music without supervision, proposing Vector Quantized Contrastive Predictive Coding to learn high-level representations and using them in a Transformer to produce coherent and high-quality variations, as demonstrated on J.S. Bach chorales.

In this work, we propose a flexible method for generating variations of discrete sequences in which tokens can be grouped into basic units, like sentences in a text or bars in music. More precisely, given a template sequence, we aim at producing novel sequences sharing perceptible similarities with the original template without relying on any annotation; so our problem of generating variations is intimately linked to the problem of learning relevant high-level representations without supervision. Our contribution is two-fold: First, we propose a self-supervised encoding technique, named Vector Quantized Contrastive Predictive Coding which allows to learn a meaningful assignment of the basic units over a discrete set of codes, together with mechanisms allowing to control the information content of these learnt discrete representations. Secondly, we show how these compressed representations can be used to generate variations of a template sequence by using an appropriate attention pattern in the Transformer architecture. We illustrate our approach on the corpus of J.S. Bach chorales where we discuss the musical meaning of the learnt discrete codes and show that our proposed method allows to generate coherent and high-quality variations of a given template.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes