CVAISep 19, 2024

Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities

arXiv:2410.02804v13 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This addresses a practical challenge in emotion recognition for applications like human-computer interaction, though it is incremental as it builds on existing retrieval and reconstruction approaches.

The paper tackles the problem of multimodal emotion recognition when some modalities are missing, proposing a retrieval-augmented framework that retrieves similar multimodal data to fill gaps, and demonstrates superior performance over state-of-the-art methods in experiments.

Multimodal emotion recognition utilizes complete multimodal information and robust multimodal joint representation to gain high performance. However, the ideal condition of full modality integrity is often not applicable in reality and there always appears the situation that some modalities are missing. For example, video, audio, or text data is missing due to sensor failure or network bandwidth problems, which presents a great challenge to MER research. Traditional methods extract useful information from the complete modalities and reconstruct the missing modalities to learn robust multimodal joint representation. These methods have laid a solid foundation for research in this field, and to a certain extent, alleviated the difficulty of multimodal emotion recognition under missing modalities. However, relying solely on internal reconstruction and multimodal joint learning has its limitations, especially when the missing information is critical for emotion recognition. To address this challenge, we propose a novel framework of Retrieval Augment for Missing Modality Multimodal Emotion Recognition (RAMER), which introduces similar multimodal emotion data to enhance the performance of emotion recognition under missing modalities. By leveraging databases, that contain related multimodal emotion data, we can retrieve similar multimodal emotion information to fill in the gaps left by missing modalities. Various experimental results demonstrate that our framework is superior to existing state-of-the-art approaches in missing modality MER tasks. Our whole project is publicly available on https://github.com/WooyoohL/Retrieval_Augment_MER.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes