SDAIASMay 21, 2025

Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes

arXiv:2505.15559v14 citationsh-index: 2Has CodeICRA
Originality Incremental advance
AI Analysis

This work addresses the need for better symbolic music processing for researchers and musicians, though it is incremental as it builds on existing transformer and tokenization methods.

Moonbeam is a transformer-based foundation model for symbolic music that tackles the problem of capturing both absolute and relative musical attributes, achieving improved accuracy and F1 scores in downstream classification tasks and outperforming baselines in conditional music generation.

Moonbeam is a transformer-based foundation model for symbolic music, pretrained on a large and diverse collection of MIDI data totaling 81.6K hours of music and 18 billion tokens. Moonbeam incorporates music-domain inductive biases by capturing both absolute and relative musical attributes through the introduction of a novel domain-knowledge-inspired tokenization method and Multidimensional Relative Attention (MRA), which captures relative music information without additional trainable parameters. Leveraging the pretrained Moonbeam, we propose 2 finetuning architectures with full anticipatory capabilities, targeting 2 categories of downstream tasks: symbolic music understanding and conditional music generation (including music infilling). Our model outperforms other large-scale pretrained music models in most cases in terms of accuracy and F1 score across 3 downstream music classification tasks on 4 datasets. Moreover, our finetuned conditional music generation model outperforms a strong transformer baseline with a REMI-like tokenizer. We open-source the code, pretrained model, and generated samples on Github.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes