IR AIFeb 12

IncompeBench: A Permissively Licensed, Fine-Grained Benchmark for Music Information Retrieval

Benjamin Clavié, Atoof Shakir, Jonah Turner, Sean Lee, Aamir Shakir, Makoto P. Kato

arXiv:2602.11941v1h-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This addresses a gap in MIR research by providing a fine-grained benchmark for researchers and practitioners, though it is incremental as it builds on existing multimodal retrieval frameworks.

The paper tackles the lack of high-quality benchmarks for evaluating music information retrieval (MIR) by introducing IncompeBench, a permissively licensed benchmark with 1,574 music snippets, 500 queries, and over 125,000 relevance judgments, resulting in a publicly available dataset for improved evaluation.

Multimodal Information Retrieval has made significant progress in recent years, leveraging the increasingly strong multimodal abilities of deep pre-trained models to represent information across modalities. Music Information Retrieval (MIR), in particular, has considerably increased in quality, with neural representations of music even making its way into everyday life products. However, there is a lack of high-quality benchmarks for evaluating music retrieval performance. To address this issue, we introduce \textbf{IncompeBench}, a carefully annotated benchmark comprising $1,574$ permissively licensed, high-quality music snippets, $500$ diverse queries, and over $125,000$ individual relevance judgements. These annotations were created through the use of a multi-stage pipeline, resulting in high agreement between human annotators and the generated data. The resulting datasets are publicly available at https://huggingface.co/datasets/mixedbread-ai/incompebench-strict and https://huggingface.co/datasets/mixedbread-ai/incompebench-lenient with the prompts available at https://github.com/mixedbread-ai/incompebench-programs.

View on arXiv PDF Code

Similar