CLJan 25, 2025

Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition

arXiv:2501.15063v13 citationsh-index: 1IJCNN

Originality Incremental advance

AI Analysis

This work addresses emotion recognition for applications in human-computer interaction, marketing, and healthcare, but it is incremental as it builds on existing multimodal methods.

The paper tackled the problem of multimodal conversational emotion recognition by addressing mutual interference between modalities and speaker interactions, resulting in a model that surpassed state-of-the-art methods on benchmark datasets with high recognition accuracy.

Emotion recognition has a wide range of applications in human-computer interaction, marketing, healthcare, and other fields. In recent years, the development of deep learning technology has provided new methods for emotion recognition. Prior to this, many emotion recognition methods have been proposed, including multimodal emotion recognition methods, but these methods ignore the mutual interference between different input modalities and pay little attention to the directional dialogue between speakers. Therefore, this article proposes a new multimodal emotion recognition method, including a cross modal context fusion module, an adaptive graph convolutional encoding module, and an emotion classification module. The cross modal context module includes a cross modal alignment module and a context fusion module, which are used to reduce the noise introduced by mutual interference between different input modalities. The adaptive graph convolution module constructs a dialogue relationship graph for extracting dependencies and self dependencies between speakers. Our model has surpassed some state-of-the-art methods on publicly available benchmark datasets and achieved high recognition accuracy.

View on arXiv PDF

Similar