On Perceived Emotion in Expressive Piano Performance: Further Experimental Evidence for the Relevance of Mid-level Perceptual Features
This work addresses the challenge of music emotion recognition for researchers and practitioners in audio content analysis, but it is incremental as it builds on existing evidence for Mid-level features.
The study tackled the problem of discerning emotional qualities between different performances of the same piano piece by analyzing feature effectiveness in predicting arousal and valence, finding that Mid-level perceptual features significantly contributed to performance-wise variation, outperforming a pre-trained emotion model.
Despite recent advances in audio content-based music emotion recognition, a question that remains to be explored is whether an algorithm can reliably discern emotional or expressive qualities between different performances of the same piece. In the present work, we analyze several sets of features on their effectiveness in predicting arousal and valence of six different performances (by six famous pianists) of Bach's Well-Tempered Clavier Book 1. These features include low-level acoustic features, score-based features, features extracted using a pre-trained emotion model, and Mid-level perceptual features. We compare their predictive power by evaluating them on several experiments designed to test performance-wise or piece-wise variations of emotion. We find that Mid-level features show significant contribution in performance-wise variation of both arousal and valence -- even better than the pre-trained emotion model. Our findings add to the evidence of Mid-level perceptual features being an important representation of musical attributes for several tasks -- specifically, in this case, for capturing the expressive aspects of music that manifest as perceived emotion of a musical performance.