SD ASApr 24, 2018

Vocal melody extraction using patch-based CNN

arXiv:1804.09202v19.551 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of efficiently extracting vocal melodies from complex music for applications in music information retrieval, though it appears incremental as it builds on existing deep learning approaches.

The paper tackles vocal melody extraction in polyphonic music by proposing a patch-based CNN model with a novel time-frequency representation that enhances pitch contours and suppresses harmonics, resulting in excellent speed and competitive accuracy compared to other deep learning methods.

A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.

View on arXiv PDF Code

Similar