SDAIASJun 10, 2024

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

arXiv:2406.06375v17 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a benchmark for researchers in cross-modal music processing, though it is incremental as it builds on existing data collection efforts.

They tackled the lack of a large-scale cross-modal dataset for music processing by creating the MOSA dataset, which includes 3-D motion capture, audio, and semantic annotations for 742 performances, totaling over 30 hours and 570K notes, and demonstrated its use in tasks like beat detection and motion generation.

In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes