CLNov 18, 2023

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition

Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

arXiv:2311.11009v14.372 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem in human-machine interaction by improving emotion recognition accuracy, though it appears incremental relative to existing graph-based methods.

The paper tackled the problem of multimodal emotion recognition by proposing a joint modality fusion and graph contrastive learning method, achieving state-of-the-art performance on three benchmark datasets.

Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared to all baselines.

View on arXiv PDF Code

Similar