Content-based feature exploration for transparent music recommendation using self-attentive genre classification
This work addresses the need for transparent music recommendations for users, but it is incremental as it applies existing self-attention techniques to feature extraction without introducing a new paradigm.
The study tackled the problem of interpretability in music recommender systems by using self-attention to extract and analyze lyric and acoustic features from large datasets, resulting in methods that provide interpretable characteristics for both lyrical and musical content through visualizations and similarity comparisons.
Interpretation of retrieved results is an important issue in music recommender systems, particularly from a user perspective. In this study, we investigate the methods for providing interpretability of content features using self-attention. We extract lyric features with the self-attentive genre classification model trained on 140,000 tracks of lyrics. Likewise, we extract acoustic features using the acoustic model with self-attention trained on 120,000 tracks of acoustic signals. The experimental results show that the proposed methods provide the characteristics that are interpretable in terms of both lyrical and musical contents. We demonstrate this by visualizing the attention weights, and by presenting the most similar songs found using lyric or audio features.