SDCLMMASFeb 26, 2023

Multi-Modality in Music: Predicting Emotion in Music from High-Level Audio Features and Lyrics

arXiv:2302.13321v19 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses music emotion recognition for applications like recommendation systems, but it is incremental as it builds on existing datasets and methods.

The paper tackled music emotion recognition by comparing multi-modal (audio features and lyrics) to uni-modal approaches, finding that multi-modal features outperform audio alone for predicting valence, with 5 out of 11 audio features contributing most to performance.

This paper aims to test whether a multi-modal approach for music emotion recognition (MER) performs better than a uni-modal one on high-level song features and lyrics. We use 11 song features retrieved from the Spotify API, combined lyrics features including sentiment, TF-IDF, and Anew to predict valence and arousal (Russell, 1980) scores on the Deezer Mood Detection Dataset (DMDD) (Delbouys et al., 2018) with 4 different regression models. We find that out of the 11 high-level song features, mainly 5 contribute to the performance, multi-modal features do better than audio alone when predicting valence. We made our code publically available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes