Graph Representation learning for Audio & Music genre Classification
This addresses the problem of music genre classification for audio content analysis, with incremental improvements in performance.
The paper tackled music genre classification by applying graph neural networks (GNNs) combined with CNNs, achieving state-of-the-art results on GTZAN and AudioSet datasets.
Music genre is arguably one of the most important and discriminative information for music and audio content. Visual representation based approaches have been explored on spectrograms for music genre classification. However, lack of quality data and augmentation techniques makes it difficult to employ deep learning techniques successfully. We discuss the application of graph neural networks on such task due to their strong inductive bias, and show that combination of CNN and GNN is able to achieve state-of-the-art results on GTZAN, and AudioSet (Imbalanced Music) datasets. We also discuss the role of Siamese Neural Networks as an analogous to GNN for learning edge similarity weights. Furthermore, we also perform visual analysis to understand the field-of-view of our model into the spectrogram based on genre labels.