SDAIHCASAug 13, 2024

A Theory-Based Explainable Deep Learning Architecture for Music Emotion

arXiv:2408.07113v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the problem of explainable emotion prediction in music for applications like digital advertising, but it is incremental as it builds on existing CNN methods with theory-based filters.

The paper tackled predicting time-varying emotional responses to music by developing a theory-based explainable CNN classifier, achieving comparable performance to atheoretical models and better than handcrafted feature models, with applications showing ads in emotionally similar contexts increase engagement (e.g., lower skip rates).

This paper paper develops a theory-based, explainable deep learning convolutional neural network (CNN) classifier to predict the time-varying emotional response to music. We design novel CNN filters that leverage the frequency harmonics structure from acoustic physics known to impact the perception of musical features. Our theory-based model is more parsimonious, but provides comparable predictive performance to atheoretical deep learning models, while performing better than models using handcrafted features. Our model can be complemented with handcrafted features, but the performance improvement is marginal. Importantly, the harmonics-based structure placed on the CNN filters provides better explainability for how the model predicts emotional response (valence and arousal), because emotion is closely related to consonance--a perceptual feature defined by the alignment of harmonics. Finally, we illustrate the utility of our model with an application involving digital advertising. Motivated by YouTube mid-roll ads, we conduct a lab experiment in which we exogenously insert ads at different times within videos. We find that ads placed in emotionally similar contexts increase ad engagement (lower skip rates, higher brand recall rates). Ad insertion based on emotional similarity metrics predicted by our theory-based, explainable model produces comparable or better engagement relative to atheoretical models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes