CLAIHCMay 17, 2018

Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data

arXiv:1805.06606v21107 citations
Originality Incremental advance
AI Analysis

This addresses emotion recognition for human-computer interaction, but it is incremental as it builds on existing multimodal approaches.

The paper tackled multimodal emotion recognition from speech and text data by proposing a convolutional attention network, which outperformed a shallow concatenation model on the CMU-MOSEI dataset.

Emotion recognition has become a popular topic of interest, especially in the field of human computer interaction. Previous works involve unimodal analysis of emotion, while recent efforts focus on multi-modal emotion recognition from vision and speech. In this paper, we propose a new method of learning about the hidden representations between just speech and text data using convolutional attention networks. Compared to the shallow model which employs simple concatenation of feature vectors, the proposed attention model performs much better in classifying emotion from speech and text data contained in the CMU-MOSEI dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes