Music theme recognition using CNN and self-attention
This work addresses music autotagging for mood/theme recognition, but it is incremental as it builds on existing architectures for a specific dataset and competition.
The authors tackled music mood/theme recognition on the MTG-Jamendo dataset by combining a CNN based on MobileNetV2 with a self-attention block, achieving 4th place on the PR-AUC-macro leaderboard in the MediaEval 2019 competition.
We present an efficient architecture to detect mood/themes in music tracks on autotagging-moodtheme subset of the MTG-Jamendo dataset. Our approach consists of two blocks, a CNN block based on MobileNetV2 architecture and a self-attention block from Transformer architecture to capture long term temporal characteristics. We show that our proposed model produces a significant improvement over the baseline model. Our model (team name: AMLAG) achieves 4th place on PR-AUC-macro Leaderboard in MediaEval 2019: Emotion and Theme Recognition in Music Using Jamendo.