SDLGMMASJan 28, 2025

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

arXiv:2501.17011v214 citationsh-index: 4AAAI
Originality Incremental advance
AI Analysis

This work addresses the need for more flexible and controllable AI tools in music composition workflows, particularly for professionals in creative industries, though it is incremental in building on existing Transformer architectures.

The authors tackled the problem of computer-assisted multitrack music composition by developing MIDI-GPT, a Transformer-based generative model that supports controllable infilling and conditioning on attributes like instrument type and style, resulting in music that avoids duplication of training data and enforces constraints effectively.

We present and release MIDI-GPT, a generative system based on the Transformer architecture that is designed for computer-assisted music composition workflows. MIDI-GPT supports the infilling of musical material at the track and bar level, and can condition generation on attributes including: instrument type, musical style, note density, polyphony level, and note duration. In order to integrate these features, we employ an alternative representation for musical material, creating a time-ordered sequence of musical events for each track and concatenating several tracks into a single sequence, rather than using a single time-ordered sequence where the musical events corresponding to different tracks are interleaved. We also propose a variation of our representation allowing for expressiveness. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material. We also outline several real-world applications of MIDI-GPT, including collaborations with industry partners that explore the integration and evaluation of MIDI-GPT into commercial products, as well as several artistic works produced using it.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes