CLSIJan 25, 2025

Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification

Microsoft
arXiv:2501.15321v110 citationsh-index: 7WWW
Originality Incremental advance
AI Analysis

This work addresses the challenge of mental health symptom identification in memes for researchers and practitioners, but it is incremental as it builds on existing multimodal models with domain-specific enhancements.

The paper tackled the problem of classifying mental health symptoms in memes by addressing the gap in multimodal language models' ability to interpret figurative language and commonsense knowledge, resulting in improvements of 4.20% and 4.66% on the weighted-F1 metric.

The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these figurative aspects inherent in memes. To address this gap, we introduce a novel dataset, AxiOM, derived from the GAD anxiety questionnaire, which categorizes memes into six fine-grained anxiety symptoms. Next, we propose a commonsense and domain-enriched framework, M3H, to enhance MLMs' ability to interpret figurative language and commonsense knowledge. The overarching goal remains to first understand and then classify the mental health symptoms expressed in memes. We benchmark M3H against 6 competitive baselines (with 20 variations), demonstrating improvements in both quantitative and qualitative metrics, including a detailed human evaluation. We observe a clear improvement of 4.20% and 4.66% on weighted-F1 metric. To assess the generalizability, we perform extensive experiments on a public dataset, RESTORE, for depressive symptom identification, presenting an extensive ablation study that highlights the contribution of each module in both datasets. Our findings reveal limitations in existing models and the advantage of employing commonsense to enhance figurative understanding.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes