CLMay 27, 2025

On VLMs for Diverse Tasks in Multimodal Meme Classification

arXiv:2505.20937v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

It addresses meme classification for social media analysis, presenting an incremental improvement over existing methods.

The paper tackled meme classification tasks by combining vision-language models (VLMs) with fine-tuned language models (LLMs), improving baseline performance by 8.34% for sarcasm, 3.52% for offensive, and 26.24% for sentiment classification.

In this paper, we present a comprehensive and systematic analysis of vision-language models (VLMs) for disparate meme classification tasks. We introduced a novel approach that generates a VLM-based understanding of meme images and fine-tunes the LLMs on textual understanding of the embedded meme text for improving the performance. Our contributions are threefold: (1) Benchmarking VLMs with diverse prompting strategies purposely to each sub-task; (2) Evaluating LoRA fine-tuning across all VLM components to assess performance gains; and (3) Proposing a novel approach where detailed meme interpretations generated by VLMs are used to train smaller language models (LLMs), significantly improving classification. The strategy of combining VLMs with LLMs improved the baseline performance by 8.34%, 3.52% and 26.24% for sarcasm, offensive and sentiment classification, respectively. Our results reveal the strengths and limitations of VLMs and present a novel strategy for meme understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes