Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification
This work addresses the problem of automating music genre classification for applications in music analysis, showing a significant improvement over previous methods but is incremental in its approach.
The paper tackled music genre classification by proposing a method that combines human perception knowledge and auditory neurophysiology, achieving human-level accuracy of 70% on a 10-genre task.
Music genre classification is one example of content-based analysis of music signals. Traditionally, human-engineered features were used to automatize this task and 61% accuracy has been achieved in the 10-genre classification. However, it's still below the 70% accuracy that humans could achieve in the same task. Here, we propose a new method that combines knowledge of human perception study in music genre classification and the neurophysiology of the auditory system. The method works by training a simple convolutional neural network (CNN) to classify a short segment of the music signal. Then, the genre of a music is determined by splitting it into short segments and then combining CNN's predictions from all short segments. After training, this method achieves human-level (70%) accuracy and the filters learned in the CNN resemble the spectrotemporal receptive field (STRF) in the auditory system.