Joint sentiment analysis of lyrics and audio in music
This work addresses the problem of mood perception in music for researchers and developers, but it is incremental as it builds on existing methods by combining modalities.
The paper tackled sentiment analysis in music by evaluating separate models for lyrics and audio, then proposing and testing combination approaches, which generally improved performance.
Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.