CL CYMay 18, 2024

MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing

Siddhant Agarwal, Shivam Sharma, Preslav Nakov, Tanmoy Chakraborty

arXiv:2405.11215v116.229 citationsh-index: 47ACL

Originality Incremental advance

AI Analysis

This work addresses the need for better interpretation of memes in multimodal communication, though it is incremental as it builds on existing research in harm detection and semantic labeling.

The authors tackled the problem of multimodal question answering for memes by introducing MemeMQA, a framework that improves answer prediction accuracy by ~18% over baselines and provides coherent explanations.

Memes have evolved as a prevalent medium for diverse communication, ranging from humour to propaganda. With the rising popularity of image-focused content, there is a growing need to explore its potential harm from different aspects. Previous studies have analyzed memes in closed settings - detecting harm, applying semantic labels, and offering natural language explanations. To extend this research, we introduce MemeMQA, a multimodal question-answering framework aiming to solicit accurate responses to structured questions while providing coherent explanations. We curate MemeMQACorpus, a new dataset featuring 1,880 questions related to 1,122 memes with corresponding answer-explanation pairs. We further propose ARSENAL, a novel two-stage multimodal framework that leverages the reasoning capabilities of LLMs to address MemeMQA. We benchmark MemeMQA using competitive baselines and demonstrate its superiority - ~18% enhanced answer prediction accuracy and distinct text generation lead across various metrics measuring lexical and semantic alignment over the best baseline. We analyze ARSENAL's robustness through diversification of question-set, confounder-based evaluation regarding MemeMQA's generalizability, and modality-specific assessment, enhancing our understanding of meme interpretation in the multimodal communication landscape.

View on arXiv PDF

Similar