SDAICLASNov 16, 2023

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

ByteDance
arXiv:2311.10057v373 citationsh-index: 42
Originality Synthesis-oriented
AI Analysis

This provides a new dataset for researchers in music-and-language AI to assess model performance, though it is incremental as it builds on existing evaluation frameworks.

The authors introduced the Song Describer Dataset (SDD), a crowdsourced corpus of 1.1k audio-caption pairs for 706 music recordings, to evaluate music-and-language models, and benchmarked popular models on tasks like music captioning and text-to-music generation, emphasizing cross-dataset evaluation.

We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes