PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection
This work addresses stance detection in multimodal social media conversations for researchers and practitioners, offering a user-centric approach that is incremental by building on existing MCSD methods.
The paper tackles the problem of Multimodal Conversational Stance Detection (MCSD) by addressing limitations like pseudo-multimodality and user homogeneity, introducing the U-MStance dataset with over 40k annotated comments and the PRISM model, which achieves significant gains over baselines in experiments.
The rapid proliferation of multimodal social media content has driven research in Multimodal Conversational Stance Detection (MCSD), which aims to interpret users' attitudes toward specific targets within complex discussions. However, existing studies remain limited by: **1) pseudo-multimodality**, where visual cues appear only in source posts while comments are treated as text-only, misaligning with real-world multimodal interactions; and **2) user homogeneity**, where diverse users are treated uniformly, neglecting personal traits that shape stance expression. To address these issues, we introduce **U-MStance**, the first user-centric MCSD dataset, containing over 40k annotated comments across six real-world targets. We further propose **PRISM**, a **P**ersona-**R**easoned mult**I**modal **S**tance **M**odel for MCSD. PRISM first derives longitudinal user personas from historical posts and comments to capture individual traits, then aligns textual and visual cues within conversational context via Chain-of-Thought to bridge semantic and pragmatic gaps across modalities. Finally, a mutual task reinforcement mechanism is employed to jointly optimize stance detection and stance-aware response generation for bidirectional knowledge transfer. Experiments on U-MStance demonstrate that PRISM yields significant gains over strong baselines, underscoring the effectiveness of user-centric and context-grounded multimodal reasoning for realistic stance understanding.