CLFeb 20, 2024

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

arXiv:2402.12862v129 citationsh-index: 68ACL
Originality Incremental advance
AI Analysis

This work addresses the issue of inconsistent human annotations in emotion recognition for speech processing, offering a novel approach to handle ambiguity, though it is incremental in improving existing methods.

The paper tackles the problem of ambiguous emotion labels in speech by proposing a method to detect ambiguous utterances as out-of-domain samples using evidential deep learning, which retains classification accuracy, and reframing emotion recognition as distribution estimation to account for individual annotations, achieving superior performance on IEMOCAP and CREMA-D datasets.

The subjective perception of emotion leads to inconsistent labels from human annotators. Typically, utterances lacking majority-agreed labels are excluded when training an emotion classifier, which cause problems when encountering ambiguous emotional expressions during testing. This paper investigates three methods to handle ambiguous emotion. First, we show that incorporating utterances without majority-agreed labels as an additional class in the classifier reduces the classification performance of the other emotion classes. Then, we propose detecting utterances with ambiguous emotions as out-of-domain samples by quantifying the uncertainty in emotion classification using evidential deep learning. This approach retains the classification accuracy while effectively detects ambiguous emotion expressions. Furthermore, to obtain fine-grained distinctions among ambiguous emotions, we propose representing emotion as a distribution instead of a single class label. The task is thus re-framed from classification to distribution estimation where every individual annotation is taken into account, not just the majority opinion. The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation. Experimental results on the IEMOCAP and CREMA-D datasets demonstrate the superior capability of the proposed method in terms of majority class prediction, emotion distribution estimation, and uncertainty estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes