CLAug 31, 2025

SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset

Răzvan-Alexandru Smădu, Andreea Iuga, Dumitru-Clementin Cercel, Florin Pop

arXiv:2509.00893v22.7h-index: 13CIKM

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of detecting satire in Romanian news at the sentence level, which is incremental as it builds on existing satire detection efforts by focusing on a new language and granularity.

The authors introduced SeLeRoSa, the first sentence-level dataset for Romanian satire detection in news articles, comprising 13,873 manually annotated sentences, and found that current LLM-based models have limitations in this task, indicating a need for further research.

Satire, irony, and sarcasm are techniques typically used to express humor and critique, rather than deceive; however, they can occasionally be mistaken for factual reporting, akin to fake news. These techniques can be applied at a more granular level, allowing satirical information to be incorporated into news articles. In this paper, we introduce the first sentence-level dataset for Romanian satire detection for news articles, called SeLeRoSa. The dataset comprises 13,873 manually annotated sentences spanning various domains, including social issues, IT, science, and movies. With the rise and recent progress of large language models (LLMs) in the natural language processing literature, LLMs have demonstrated enhanced capabilities to tackle various tasks in zero-shot settings. We evaluate multiple baseline models based on LLMs in both zero-shot and fine-tuning settings, as well as baseline transformer-based models. Our findings reveal the current limitations of these models in the sentence-level satire detection task, paving the way for new research directions.

View on arXiv PDF

Similar