CL SD ASMar 31, 2022

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

Sreyan Ghosh, Utkarsh Tyagi, S Ramaneswaran, Harshvardhan Srivastava, Dinesh Manocha

arXiv:2203.16794v51.928 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses emotion recognition from speech, which is important for applications like human-computer interaction, but appears incremental as it builds on existing multimodal and multi-task learning techniques.

The paper tackles speech emotion recognition by proposing MMER, a multimodal multi-task learning approach that uses early-fusion and cross-modal self-attention between text and acoustic modalities with three auxiliary tasks, achieving state-of-the-art performance on the IEMOCAP benchmark.

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

View on arXiv PDF Code

Similar