SDLGMMASMay 10, 2021

MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer with One Transformer VAE

arXiv:2105.04090v381 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of controlling musical attributes like rhythmic intensity and polyphony at the bar level for pop piano pieces, representing an incremental improvement over existing methods.

The paper tackled the problem of combining Transformers and VAEs for symbolic music generation to achieve full-song and fine-grained piano music style transfer, resulting in MuseMorphose outperforming RNN-based baselines on multiple metrics.

Transformers and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation. While the former boast an impressive capability in modeling long sequences, the latter allow users to willingly exert control over different parts (e.g., bars) of the music to be generated. In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths. The task is split into two steps. First, we equip Transformer decoders with the ability to accept segment-level, time-varying conditions during sequence generation. Subsequently, we combine the developed and tested in-attention decoder with a Transformer encoder, and train the resulting MuseMorphose model with the VAE objective to achieve style transfer of long pop piano pieces, in which users can specify musical attributes including rhythmic intensity and polyphony (i.e., harmonic fullness) they desire, down to the bar level. Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based baselines on numerous widely-used metrics for style transfer tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes