CLMMSIMar 15, 2025

Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models

arXiv:2503.12149v32 citationsh-index: 7Has CodeIEEE Trans Comput Soc Syst
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of inconsistent sarcasm interpretation in AI models for researchers and developers, offering insights into multi-perspective modeling, though it is incremental as it builds on existing datasets and methods.

The study analyzed how 12 large vision-language models interpret multimodal sarcasm across 2,409 samples, finding notable discrepancies in their perceptions and highlighting the subjectivity of sarcasm beyond binary labeling.

With the advent of large vision-language models (LVLMs) demonstrating increasingly human-like abilities, a pivotal question emerges: do different LVLMs interpret multimodal sarcasm differently, and can a single model grasp sarcasm from multiple perspectives like humans? To explore this, we introduce an analytical framework using systematically designed prompts on existing multimodal sarcasm datasets. Evaluating 12 state-of-the-art LVLMs over 2,409 samples, we examine interpretive variations within and across models, focusing on confidence levels, alignment with dataset labels, and recognition of ambiguous "neutral" cases. We further validate our findings on a diverse 100-sample mini-benchmark, incorporating multiple datasets, expanded prompt variants, and representative commercial LVLMs. Our findings reveal notable discrepancies -- across LVLMs and within the same model under varied prompts. While classification-oriented prompts yield higher internal consistency, models diverge markedly when tasked with interpretive reasoning. These results challenge binary labeling paradigms by highlighting sarcasm's subjectivity. We advocate moving beyond rigid annotation schemes toward multi-perspective, uncertainty-aware modeling, offering deeper insights into multimodal sarcasm comprehension. Our code and data are available at: https://github.com/CoderChen01/LVLMSarcasmAnalysis

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes