CVMMAug 28, 2025

"Humor, Art, or Misinformation?": A Multimodal Dataset for Intent-Aware Synthetic Image Detection

arXiv:2508.20670v21 citationsh-index: 16Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web
Originality Incremental advance
AI Analysis

This addresses the challenge of intent-aware synthetic image detection for social media platforms and content moderators, though it is incremental as it builds on existing multimodal detection efforts.

The paper tackled the problem of detecting the intent behind AI-generated images, such as humor, art, or misinformation, by introducing the S-HArM dataset with 9,576 labeled image-text pairs and exploring synthetic training strategies. The results showed that models trained on image- and multimodally-guided data generalized better to real-world content, but overall performance remained limited.

Recent advances in multimodal AI have enabled progress in detecting synthetic and out-of-context content. However, existing efforts largely overlook the intent behind AI-generated images. To fill this gap, we introduce S-HArM, a multimodal dataset for intent-aware classification, comprising 9,576 "in the wild" image-text pairs from Twitter/X and Reddit, labeled as Humor/Satire, Art, or Misinformation. Additionally, we explore three prompting strategies (image-guided, description-guided, and multimodally-guided) to construct a large-scale synthetic training dataset with Stable Diffusion. We conduct an extensive comparative study including modality fusion, contrastive learning, reconstruction networks, attention mechanisms, and large vision-language models. Our results show that models trained on image- and multimodally-guided data generalize better to "in the wild" content, due to preserved visual context. However, overall performance remains limited, highlighting the complexity of inferring intent and the need for specialized architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes