CL SD ASApr 17, 2019

Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis

Feiyang Chen, Ziqian Luo, Yanyan Xu, Dengfeng Ke

arXiv:1904.08138v52.087 citations

Originality Incremental advance

AI Analysis

This work addresses sentiment analysis for real-world multimodal data, showing incremental improvements in accuracy and generalization.

The paper tackles multimodal sentiment analysis by proposing a novel fusion strategy for audio and text data, achieving state-of-the-art results on datasets like CMU-MOSI, CMU-MOSEI, and IEMOCAP.

Sentiment analysis, mostly based on text, has been rapidly developing in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion strategy including both multi-feature fusion and multi-modality fusion to improve the accuracy of audio-text sentiment analysis. We call it the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, which consists of two parallel branches, the audio modality based branch and the text modality based branch. Its core mechanisms are the fusion of multiple feature vectors and multiple modality attention. Experiments on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, both collected from YouTube for sentiment analysis, show the very competitive results of our DFF-ATMF model. Furthermore, by virtue of attention weight distribution heatmaps, we also demonstrate the deep features learned by using DFF-ATMF are complementary to each other and robust. Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition.

View on arXiv PDF

Similar