SD AI LG MM ASApr 13, 2021

Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition

arXiv:2104.06517v119.344 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of automating emotion recognition in music for applications in music recommendation and analysis, but it is incremental as it applies existing embedding methods to a specific domain.

The paper tackled the problem of capturing complex emotions in music for Music Emotion Recognition (MER) by investigating pre-trained deep audio embeddings like L3-Net and VGGish, and found that these methods improved performance over previous baseline models across four datasets.

Emotion is a complicated notion present in music that is hard to capture even with fine-tuned feature engineering. In this paper, we investigate the utility of state-of-the-art pre-trained deep audio embedding methods to be used in the Music Emotion Recognition (MER) task. Deep audio embedding methods allow us to efficiently capture the high dimensional features into a compact representation. We implement several multi-class classifiers with deep audio embeddings to predict emotion semantics in music. We investigate the effectiveness of L3-Net and VGGish deep audio embedding methods for music emotion inference over four music datasets. The experiments with several classifiers on the task show that the deep audio embedding solutions can improve the performances of the previous baseline MER models. We conclude that deep audio embeddings represent musical emotion semantics for the MER task without expert human engineering.

View on arXiv PDF

Similar