SDAIJun 8, 2016

Symbolic Music Data Version 1.0

arXiv:1606.02542v13 citations
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific resource for researchers in music AI, but it is incremental as it focuses on dataset creation without novel methods or broad impact.

The authors introduced a new dataset for training machine learning models on symbolic music data, including a newly collected corpus of 20K MIDI files, and described preprocessing, cleaning, and data splits based on clustering.

In this document, we introduce a new dataset designed for training machine learning models of symbolic music data. Five datasets are provided, one of which is from a newly collected corpus of 20K midi files. We describe our preprocessing and cleaning pipeline, which includes the exclusion of a number of files based on scores from a previously developed probabilistic machine learning model. We also define training, testing and validation splits for the new dataset, based on a clustering scheme which we also describe. Some simple histograms are included.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes