Sens-VisualNews: A Benchmark Dataset for Sensational Image Detection
This work provides a new benchmark for detecting sensational visual content in news, which is important for disinformation detection and content moderation, but the task is narrowly defined and the results are preliminary.
The paper introduces the task of sensational image detection and presents Sens-VisualNews, a benchmark dataset of 9,576 news images annotated for sensational content. Experiments show that current Multimodal LLMs struggle with this task, with fine-tuned models achieving only moderate performance.
The detection of sensational content in media items can be a critical filtering mechanism for identifying check-worthy content and flagging potential disinformation, since such content triggers physiological arousal that often bypasses critical evaluation and accelerates viral sharing. In this paper we introduce the task of sensational image detection, which aims to determine whether an image contains shocking, provocative, or emotionally charged features to grab attention and trigger strong emotional responses. To support research on this task, we create a new benchmark dataset (called Sens-VisualNews) that contains 9,576 images from news items, annotated based on the (in-)existence of various sensational concepts and events in their visual content. Finally, using Sens-VisualNews, we study the prompt sensitivity, performance and robustness of a wide range of open SotA Multimodal LLMs, across both zero-shot and fine-tuned settings.