SDAIFeb 11, 2025

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

arXiv:2502.07461v217 citationsh-index: 17IJCNN
Originality Incremental advance
AI Analysis

This work addresses the need for a large-scale music-caption dataset for researchers working on music-language understanding tasks, providing a valuable resource for the music information retrieval community.

The authors introduced JamendoMaxCaps, a large-scale music-caption dataset with over 362,000 tracks, and demonstrated its effectiveness with five quantitative measurements. This dataset provides a comprehensive resource for music-language understanding tasks.

We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 362,000 freely licensed instrumental tracks from the renowned Jamendo platform. The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata. We also introduce a retrieval system that leverages both musical features and metadata to identify similar songs, which are then used to fill in missing metadata using a local large language model (LLLM). This approach allows us to provide a more comprehensive and informative dataset for researchers working on music-language understanding tasks. We validate this approach quantitatively with five different measurements. By making the JamendoMaxCaps dataset publicly available, we provide a high-quality resource to advance research in music-language understanding tasks such as music retrieval, multimodal representation learning, and generative music models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes