CLAISDASDec 13, 2021

Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

arXiv:2112.06603v1
Originality Incremental advance
AI Analysis

This work addresses emotion understanding in personal narratives for natural language processing and dialogue systems, but it is incremental as it builds on prior lexical-based methods by adding acoustic features.

The paper tackled the problem of identifying Emotion Carriers (EC) in spoken narratives, which are segments that explain the narrator's emotional state, by combining acoustic and lexical representations, and found that late fusion significantly improved detection performance.

Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of father", "made me choose"). Once extracted, such EC can provide a richer representation of the user state to improve natural language understanding and dialogue modeling. In previous work, it has been shown that EC can be identified using lexical features. However, spoken narratives should provide a richer description of the context and the users' emotional state. In this paper, we leverage word-based acoustic and textual embeddings as well as early and late fusion techniques for the detection of ECs in spoken narratives. For the acoustic word-level representations, we use Residual Neural Networks (ResNet) pretrained on separate speech emotion corpora and fine-tuned to detect EC. Experiments with different fusion and system combination strategies show that late fusion leads to significant improvements for this task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes