What Makes You CLIC: Detection of Croatian Clickbait Headlines
This addresses clickbait detection for Croatian media, an incremental contribution as it applies existing methods to a less-resourced language.
The paper tackled clickbait detection in Croatian news headlines by compiling a novel dataset (CLIC) and comparing fine-tuned BERTić models with LLM-based in-context learning methods, finding that fine-tuned models performed better and that nearly half of the headlines contained clickbait.
Online news outlets operate predominantly on an advertising-based revenue model, compelling journalists to create headlines that are often scandalous, intriguing, and provocative -- commonly referred to as clickbait. Automatic detection of clickbait headlines is essential for preserving information quality and reader trust in digital media and requires both contextual understanding and world knowledge. For this task, particularly in less-resourced languages, it remains unclear whether fine-tuned methods or in-context learning (ICL) yield better results. In this paper, we compile CLIC, a novel dataset for clickbait detection of Croatian news headlines spanning a 20-year period and encompassing mainstream and fringe outlets. We fine-tune the BERTić model on this task and compare its performance to LLM-based ICL methods with prompts both in Croatian and English. Finally, we analyze the linguistic properties of clickbait. We find that nearly half of the analyzed headlines contain clickbait, and that finetuned models deliver better results than general LLMs.