CLSDASFeb 5, 2023

deep learning of segment-level feature representation for speech emotion recognition in conversations

arXiv:2302.02419v13 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses emotion detection in dialogues, which is important for applications like human-computer interaction, but it appears incremental as it builds on existing techniques without a major breakthrough.

The paper tackled speech emotion recognition in conversations by proposing a method using pretrained VGGish for segment-level features and an attentive GRU to model contextual dependencies, achieving effectiveness demonstrated on the MELD dataset compared to state-of-the-art methods.

Accurately detecting emotions in conversation is a necessary yet challenging task due to the complexity of emotions and dynamics in dialogues. The emotional state of a speaker can be influenced by many different factors, such as interlocutor stimulus, dialogue scene, and topic. In this work, we propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions. First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances. Second, an attentive bi-directional gated recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly in a dynamic manner. The experiments conducted on the standard conversational dataset MELD demonstrate the effectiveness of the proposed method when compared against state-of the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes