CLAIApr 4, 2025

Stance-Driven Multimodal Controlled Statement Generation: New Dataset and Task

arXiv:2504.03295v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the need for better controllable text generation in political discourse, though it is incremental as it builds on existing multimodal and stance detection research.

The paper tackles the problem of generating stance-controlled responses for multimodal political content by introducing a new dataset, StanceGen2024, and a framework, SDMG, which improves semantic consistency and stance control, achieving a 15% increase in stance accuracy over baseline methods.

Formulating statements that support diverse or controversial stances on specific topics is vital for platforms that enable user expression, reshape political discourse, and drive social critique and information dissemination. With the rise of Large Language Models (LLMs), controllable text generation towards specific stances has become a promising research area with applications in shaping public opinion and commercial marketing. However, current datasets often focus solely on pure texts, lacking multimodal content and effective context, particularly in the context of stance detection. In this paper, we formally define and study the new problem of stance-driven controllable content generation for tweets with text and images, where given a multimodal post (text and image/video), a model generates a stance-controlled response. To this end, we create the Multimodal Stance Generation Dataset (StanceGen2024), the first resource explicitly designed for multimodal stance-controllable text generation in political discourse. It includes posts and user comments from the 2024 U.S. presidential election, featuring text, images, videos, and stance annotations to explore how multimodal political content shapes stance expression. Furthermore, we propose a Stance-Driven Multimodal Generation (SDMG) framework that integrates weighted fusion of multimodal features and stance guidance to improve semantic consistency and stance control. We release the dataset and code (https://anonymous.4open.science/r/StanceGen-BE9D) for public use and further research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes