MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles
This addresses the challenge of distinguishing satire from fake news in Romanian media, though it is incremental as it extends multimodal approaches to a new language and domain.
The authors tackled the problem of detecting satire in Romanian news articles by creating MuSaRoNews, a multimodal dataset of 117,834 articles, and demonstrated that using both text and visual modalities improves detection performance.
Satire and fake news can both contribute to the spread of false information, even though both have different purposes (one if for amusement, the other is to misinform). However, it is not enough to rely purely on text to detect the incongruity between the surface meaning and the actual meaning of the news articles, and, often, other sources of information (e.g., visual) provide an important clue for satire detection. This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews. Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language. We conducted experiments and showed that the use of both modalities improves performance.