CLFeb 5, 2025

LLaVAC: Fine-tuning LLaVA as a Multimodal Sentiment Classifier

arXiv:2502.02938v15 citationsh-index: 4Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses sentiment classification from images and text, but it is incremental as it applies an existing method to a specific domain.

The authors tackled multimodal sentiment analysis by fine-tuning LLaVA with a structured prompt, achieving state-of-the-art performance on the MVSA-Single dataset across three data processing procedures.

We present LLaVAC, a method for constructing a classifier for multimodal sentiment analysis. This method leverages fine-tuning of the Large Language and Vision Assistant (LLaVA) to predict sentiment labels across both image and text modalities. Our approach involves designing a structured prompt that incorporates both unimodal and multimodal labels to fine-tune LLaVA, enabling it to perform sentiment classification effectively. Experiments on the MVSA-Single dataset demonstrate that LLaVAC outperforms existing methods in multimodal sentiment analysis across three data processing procedures. The implementation of LLaVAC is publicly available at https://github.com/tchayintr/llavac.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes