SDCLASSep 5, 2025

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning

arXiv:2509.04744v110 citationsh-index: 10EMNLP
Originality Incremental advance
AI Analysis

This addresses the need for standardized evaluation of MLLMs in practical music analysis tasks, though it is incremental as it extends existing benchmarking approaches to a new domain.

The authors tackled the problem of evaluating multimodal large language models' reasoning abilities in symbolic music analysis by introducing WildScore, the first in-the-wild benchmark for this domain, which revealed both promising directions and persistent challenges in MLLMs' performance.

Recent advances in Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various vision-language tasks. However, their reasoning abilities in the multimodal symbolic music domain remain largely unexplored. We introduce WildScore, the first in-the-wild multimodal symbolic music reasoning and analysis benchmark, designed to evaluate MLLMs' capacity to interpret real-world music scores and answer complex musicological queries. Each instance in WildScore is sourced from genuine musical compositions and accompanied by authentic user-generated questions and discussions, capturing the intricacies of practical music analysis. To facilitate systematic evaluation, we propose a systematic taxonomy, comprising both high-level and fine-grained musicological ontologies. Furthermore, we frame complex music reasoning as multiple-choice question answering, enabling controlled and scalable assessment of MLLMs' symbolic music understanding. Empirical benchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patterns in their visual-symbolic reasoning, uncovering both promising directions and persistent challenges for MLLMs in symbolic music reasoning and analysis. We release the dataset and code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes