SDASOct 25, 2018

Multi-Channel Auto-Encoder for Speech Emotion Recognition

arXiv:1810.10662v14 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving emotion inference in voice dialogue applications, representing an incremental advancement in the field.

The paper tackles speech emotion recognition by proposing a multi-channel auto-encoder framework that integrates local and global acoustic features, achieving a 64.8% accuracy on the IEMOCAP dataset, which is 2.4% higher than previous state-of-the-art results.

Inferring emotion status from users' queries plays an important role to enhance the capacity in voice dialogues applications. Even though several related works obtained satisfactory results, the performance can still be further improved. In this paper, we proposed a novel framework named multi-channel auto-encoder (MTC-AE) on emotion recognition from acoustic information. MTC-AE contains multiple local DNNs based on different low-level descriptors with different statistics functions that are partly concatenated together, by which the structure is enabled to consider both local and global features simultaneously. Experiment based on a benchmark dataset IEMOCAP shows that our method significantly outperforms the existing state-of-the-art results, achieving $64.8\%$ leave-one-speaker-out unweighted accuracy, which is $2.4\%$ higher than the best result on this dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes