CLJun 28, 2022

MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

arXiv:2206.13969v16 citationsh-index: 47
Originality Synthesis-oriented
AI Analysis

This addresses a gap in multimodal sentiment analysis for researchers by providing aligned annotations, though it is incremental as it builds on existing datasets.

The authors tackled the lack of multimodal fine-grained sentiment datasets by creating MACSA, a dataset with over 21K text-image pairs that aligns aspect categories across modalities, and proposed a graph-based model that facilitates baseline comparisons.

Multimodal fine-grained sentiment analysis has recently attracted increasing attention due to its broad applications. However, the existing multimodal fine-grained sentiment datasets most focus on annotating the fine-grained elements in text but ignore those in images, which leads to the fine-grained elements in visual content not receiving the full attention they deserve. In this paper, we propose a new dataset, the Multimodal Aspect-Category Sentiment Analysis (MACSA) dataset, which contains more than 21K text-image pairs. The dataset provides fine-grained annotations for both textual and visual content and firstly uses the aspect category as the pivot to align the fine-grained elements between the two modalities. Based on our dataset, we propose the Multimodal ACSA task and a multimodal graph-based aligned model (MGAM), which adopts a fine-grained cross-modal fusion method. Experimental results show that our method can facilitate the baseline comparison for future research on this corpus. We will make the dataset and code publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes