Two-level Explanations in Music Emotion Recognition
This work addresses the need for interpretable explanations in music emotion recognition, which is incremental as it builds on existing models to enhance transparency.
The authors tackled the problem of uninterpretable predictions in music emotion recognition by proposing a two-step procedure that links spectrogram-level audio patterns to mid-level perceptual features and then to emotion predictions, enabling specific musical explanations for predictions.
Current ML models for music emotion recognition, while generally working quite well, do not give meaningful or intuitive explanations for their predictions. In this work, we propose a 2-step procedure to arrive at spectrogram-level explanations that connect certain aspects of the audio to interpretable mid-level perceptual features, and these to the actual emotion prediction. That makes it possible to focus on specific musical reasons for a prediction (in terms of perceptual features), and to trace these back to patterns in the audio that can be interpreted visually and acoustically.