AS LGAug 21, 2024

The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability

James Tavernor, Yara El-Tawil, Emily Mower Provost

arXiv:2408.11956v25.14 citationsh-index: 36Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of capturing nuanced emotional variability in speech emotion recognition, which is important for improving model accuracy in affective computing, though it is incremental as it builds on existing distribution-based methods.

The paper tackles the problem of high variability in emotion labels from multiple annotators by learning to predict individual annotators and creating distributions from continuous model outputs, resulting in more accurate emotion distributions than prior work in within- and cross-corpus settings.

Emotion expression and perception are nuanced, complex, and highly subjective processes. When multiple annotators label emotional data, the resulting labels contain high variability. Most speech emotion recognition tasks address this by averaging annotator labels as ground truth. However, this process omits the nuance of emotion and inter-annotator variability, which are important signals to capture. Previous work has attempted to learn distributions to capture emotion variability, but these methods also lose information about the individual annotators. We address these limitations by learning to predict individual annotators and by introducing a novel method to create distributions from continuous model outputs that permit the learning of emotion distributions during model training. We show that this combined approach can result in emotion distributions that are more accurate than those seen in prior work, in both within- and cross-corpus settings.

View on arXiv PDF Code

Similar