MM CLSep 1, 2024

Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model

Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang

arXiv:2409.00597v19.225 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses the problem of detecting stances in multimodal social media conversations for researchers, though it is incremental as it builds on existing multimodal stance detection by adding conversational context.

The authors tackled the lack of datasets for multimodal stance detection in conversational contexts by introducing MmMtCSD, a new dataset, and proposed MLLM-SD, a model that achieved state-of-the-art performance on this dataset.

Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the proliferation of diverse multimodal social media content including text, and images multimodal stance detection (MSD) has become a crucial research area. However, existing MSD studies have focused on modeling stance within individual text-image pairs, overlooking the multi-party conversational contexts that naturally occur on social media. This limitation stems from a lack of datasets that authentically capture such conversational scenarios, hindering progress in conversational MSD. To address this, we introduce a new multimodal multi-turn conversational stance detection dataset (called MmMtCSD). To derive stances from this challenging dataset, we propose a novel multimodal large language model stance detection framework (MLLM-SD), that learns joint stance representations from textual and visual modalities. Experiments on MmMtCSD show state-of-the-art performance of our proposed MLLM-SD approach for multimodal stance detection. We believe that MmMtCSD will contribute to advancing real-world applications of stance detection research.

View on arXiv PDF

Similar