SD CL MMNov 6, 2025

MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation

Shih-Lun Wu, Yoon Kim, Cheng-Zhi Anna Huang

arXiv:2511.03942v19.32 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses the problem of automated music generation for users needing text-to-MIDI conversion, but it is incremental as it builds on existing LLM and text-to-music methods.

The paper tackles the problem of generating multitrack MIDI music from free-form text prompts by adapting a large language model (LLM) with MIDI tokens and a two-stage training recipe. The result is MIDI-LLM, which achieves higher quality, better text control, and faster inference compared to the recent Text2midi model, as demonstrated in experiments.

We present MIDI-LLM, an LLM for generating multitrack MIDI music from free-form text prompts. Our approach expands a text LLM's vocabulary to include MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI abilities. By preserving the original LLM's parameter structure, we can directly leverage the vLLM library for accelerated inference. Experiments show that MIDI-LLM achieves higher quality, better text control, and faster inference compared to the recent Text2midi model. Live demo at https://midi-llm-demo.vercel.app.

View on arXiv PDF

Similar