MMDec 15, 2020

A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis

arXiv:2012.08256v181 citations
AI Analysis

This work aims to improve multimodal sentiment classification performance for social media content analysis, particularly for applications requiring the understanding of combined image and text data.

This paper addresses the challenge of multimodal sentiment analysis by proposing a Deep Multi-Level Attentive network. The method focuses on exploiting correlations between image and text modalities, leading to superior performance on four real-world datasets.

Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-Level Attentive network, which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify CNNs representation power. Then we model the correlation between the image regions and semantics of the word by extracting the textual features related to the bi-attentive visual features by applying semantic attention. Finally, self-attention is employed to automatically fetch the sentiment-rich multimodal features for the classification. We conduct extensive evaluations on four real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty Images, which verifies the superiority of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes