Melodic Phrase Segmentation By Deep Neural Networks
This work addresses a classical task in music information retrieval for automated music analysis, but it is incremental as it adapts existing neural network methods to a specific domain problem.
The paper tackled melodic phrase segmentation in music information retrieval by adapting neural network architectures to symbolic music data, addressing sparse labeling issues with tailored label engineering and training techniques. The CNN-CRF architecture performed best, offering finer segmentation and faster training, with CNN, Bi-LSTM-CNN, and Bi-LSTM-CRF as acceptable alternatives.
Automated melodic phrase detection and segmentation is a classical task in content-based music information retrieval and also the key towards automated music structure analysis. However, traditional methods still cannot satisfy practical requirements. In this paper, we explore and adapt various neural network architectures to see if they can be generalized to work with the symbolic representation of music and produce satisfactory melodic phrase segmentation. The main issue of applying deep-learning methods to phrase detection is the sparse labeling problem of training sets. We proposed two tailored label engineering with corresponding training techniques for different neural networks in order to make decisions at a sequential level. Experiment results show that the CNN-CRF architecture performs the best, being able to offer finer segmentation and faster to train, while CNN, Bi-LSTM-CNN and Bi-LSTM-CRF are acceptable alternatives.