SDLGASJul 27, 2023

Graph-based Polyphonic Multitrack Music Generation

arXiv:2307.14928v16 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the lack of graph-based deep learning methods for music generation, offering a novel approach for human-computer interaction in music co-creation, though it is incremental in applying existing deep learning techniques to a new representation.

The paper tackles the problem of generating polyphonic multitrack symbolic music by introducing a graph representation and a hierarchical Variational Autoencoder that separately generates structure and content, enabling conditional generation based on instrument timing. The model produces tonally and rhythmically consistent music, as shown through experiments on MIDI datasets, with visualizations indicating organization of the latent space according to musical concepts.

Graphs can be leveraged to model polyphonic multitrack symbolic music, where notes, chords and entire sections may be linked at different levels of the musical hierarchy by tonal and rhythmic relationships. Nonetheless, there is a lack of works that consider graph representations in the context of deep learning systems for music generation. This paper bridges this gap by introducing a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately, one after the other, with a hierarchical architecture that matches the structural priors of music. By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times. This opens the door to a new form of human-computer interaction in the context of music co-creation. After training the model on existing MIDI datasets, the experiments show that the model is able to generate appealing short and long musical sequences and to realistically interpolate between them, producing music that is tonally and rhythmically consistent. Finally, the visualization of the embeddings shows that the model is able to organize its latent space in accordance with known musical concepts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes