Learning Features of Music from Scratch
This provides a new dataset and benchmark for music research, enabling evaluation of machine learning methods on note prediction tasks, but it is incremental as it applies existing methods to new data.
The paper introduces MusicNet, a large-scale dataset of classical music with instrument and note annotations, and benchmarks several machine learning architectures for multi-label note prediction, showing that end-to-end models learn frequency-selective filters as low-level audio representations.
This paper introduces a new large-scale music dataset, MusicNet, to serve as a source of supervision and evaluation of machine learning methods for music research. MusicNet consists of hundreds of freely-licensed classical music recordings by 10 composers, written for 11 instruments, together with instrument/note annotations resulting in over 1 million temporal labels on 34 hours of chamber music performances under various studio and microphone conditions. The paper defines a multi-label classification task to predict notes in musical recordings, along with an evaluation protocol, and benchmarks several machine learning architectures for this task: i) learning from spectrogram features; ii) end-to-end learning with a neural net; iii) end-to-end learning with a convolutional neural net. These experiments show that end-to-end models trained for note prediction learn frequency selective filters as a low-level representation of audio.