SDApr 17

TinyMU: A Compact Audio-Language Model for Music Understanding

arXiv:2604.1584977.2h-index: 30
Predicted impact top 23% in SD · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners needing efficient music understanding on edge devices, TinyMU demonstrates that small models can rival large ones under constrained budgets.

TinyMU is a compact 229M-parameter music-language model that achieves 82% of SOTA LALM performance on MuChoMusic while being 35x smaller, enabled by a curated 3.5M-sample dataset and a lightweight architecture.

Music understanding and reasoning are central challenges in the Music Information Research field, with applications ranging from retrieval and recommendation to music agents and virtual assistants. Recent Large Audio-Language Models (LALMs) have shown remarkable progress in answering music-related questions by following user instructions. However, their massive scale, often billions of parameters, results in expensive training, slow inference, and limited deployability on edge devices. In this work, we present TinyMU, a lightweight (229M) Music-Language Model (MLM) that achieves performance comparable to much larger LALMs while remaining efficient and compact. To train TinyMU, we introduce MusicSkills-3.5M, a carefully curated, music-grounded question-answering dataset with 3.5M samples. Spanning multiple-choice, binary, and open-ended formats, this dataset provides fine-grained supervision across diverse musical concepts. For its architecture, TinyMU leverages MATPAC++, the SOTA self-supervised audio encoder for fine-grained feature extraction. Paired with a lightweight linear projector, it efficiently aligns audio embeddings with the language model. Through extensive evaluation, we show that TinyMU performs strongly in both basic music understanding and complex reasoning. Notably, on the MuChoMusic benchmark, it achieves 82\% of SOTA LALM's performance despite being 35x smaller, highlighting the potential of small MLMs under constrained computational budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes