CVAIFeb 21, 2025

SentiFormer: Metadata Enhanced Transformer for Image Sentiment Analysis

arXiv:2502.15322v11 citationsh-index: 7ICASSP
Originality Incremental advance
AI Analysis

This work addresses image sentiment analysis for internet users by enhancing metadata integration, though it is incremental as it builds on existing transformer and fusion techniques.

The paper tackles the problem of insufficient use of metadata in image sentiment analysis by proposing SentiFormer, a transformer-based model that fuses multiple metadata with images, achieving superior results on three public datasets.

As more and more internet users post images online to express their daily emotions, image sentiment analysis has attracted increasing attention. Recently, researchers generally tend to design different neural networks to extract visual features from images for sentiment analysis. Despite the significant progress, metadata, the data (e.g., text descriptions and keyword tags) for describing the image, has not been sufficiently explored in this task. In this paper, we propose a novel Metadata Enhanced Transformer for sentiment analysis (SentiFormer) to fuse multiple metadata and the corresponding image into a unified framework. Specifically, we first obtain multiple metadata of the image and unify the representations of diverse data. To adaptively learn the appropriate weights for each metadata, we then design an adaptive relevance learning module to highlight more effective information while suppressing weaker ones. Moreover, we further develop a cross-modal fusion module to fuse the adaptively learned representations and make the final prediction. Extensive experiments on three publicly available datasets demonstrate the superiority and rationality of our proposed method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes