AICVHCApr 16, 2018

Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning

arXiv:1804.05788v3167 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more accurate emotion recognition in human-computer interaction, though it is incremental by extending existing neural network approaches to multimodal data.

The paper tackled multimodal emotion recognition on the IEMOCAP dataset by using neural networks to integrate speech, text, and motion capture data, achieving improved robustness and accuracy compared to prior speech-only methods.

Emotion recognition has become an important field of research in Human Computer Interactions as we improve upon the techniques for modelling the various aspects of behaviour. With the advancement of technology our understanding of emotions are advancing, there is a growing need for automatic emotion recognition systems. One of the directions the research is heading is the use of Neural Networks which are adept at estimating complex functions that depend on a large number and diverse source of input data. In this paper we attempt to exploit this effectiveness of Neural networks to enable us to perform multimodal Emotion recognition on IEMOCAP dataset using data from Speech, Text, and Motion capture data from face expressions, rotation and hand movements. Prior research has concentrated on Emotion detection from Speech on the IEMOCAP dataset, but our approach is the first that uses the multiple modes of data offered by IEMOCAP for a more robust and accurate emotion detection.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes