SD AIFeb 11, 2025

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

Abhinaba Roy, Renhang Liu, Tongyu Lu, Dorien Herremans

arXiv:2502.07461v221.817 citationsh-index: 17Has CodeIJCNN

Originality Incremental advance

AI Analysis

This work addresses the need for a large-scale music-caption dataset for researchers working on music-language understanding tasks, providing a valuable resource for the music information retrieval community.

The authors introduced JamendoMaxCaps, a large-scale music-caption dataset with over 362,000 tracks, and demonstrated its effectiveness with five quantitative measurements. This dataset provides a comprehensive resource for music-language understanding tasks.

We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 362,000 freely licensed instrumental tracks from the renowned Jamendo platform. The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata. We also introduce a retrieval system that leverages both musical features and metadata to identify similar songs, which are then used to fill in missing metadata using a local large language model (LLLM). This approach allows us to provide a more comprehensive and informative dataset for researchers working on music-language understanding tasks. We validate this approach quantitatively with five different measurements. By making the JamendoMaxCaps dataset publicly available, we provide a high-quality resource to advance research in music-language understanding tasks such as music retrieval, multimodal representation learning, and generative music models.

View on arXiv PDF Code

Similar