SD LGJun 1, 2016

Automatic tagging using deep convolutional neural networks

Keunwoo Choi, George Fazekas, Mark Sandler

arXiv:1606.00298v135.9348 citations

Originality Incremental advance

AI Analysis

This work addresses content-based music tagging for applications in music recommendation and organization, but it is incremental as it builds on existing deep learning methods for audio processing.

The authors tackled automatic music tagging by evaluating fully convolutional neural networks (FCNs) with different architectures and input types, achieving state-of-the-art performance with a 4-layer model on the MagnaTagATune dataset and showing that deeper models perform better on larger datasets like the Million Song Dataset.

We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.

View on arXiv PDF

Similar