Representation Learning of Music Using Artist Labels
This work addresses the problem of noisy or expensive annotations in music feature learning for researchers and practitioners, though it is incremental as it builds on existing supervised methods with a different label type.
The paper tackles the challenge of learning discriminative music features by proposing a supervised approach using artist labels as objective metadata, achieving performance comparable to state-of-the-art methods in music classification and retrieval tasks.
In music domain, feature learning has been conducted mainly in two ways: unsupervised learning based on sparse representations or supervised learning by semantic labels such as music genre. However, finding discriminative features in an unsupervised way is challenging and supervised feature learning using semantic labels may involve noisy or expensive annotation. In this paper, we present a supervised feature learning approach using artist labels annotated in every single track as objective meta data. We propose two deep convolutional neural networks (DCNN) to learn the deep artist features. One is a plain DCNN trained with the whole artist labels simultaneously, and the other is a Siamese DCNN trained with a subset of the artist labels based on the artist identity. We apply the trained models to music classification and retrieval tasks in transfer learning settings. The results show that our approach is comparable to previous state-of-the-art methods, indicating that the proposed approach captures general music audio features as much as the models learned with semantic labels. Also, we discuss the advantages and disadvantages of the two models.