CLAIJun 8, 2024

ThatiAR: Subjectivity Detection in Arabic News Sentences

arXiv:2406.05559v19 citations
Originality Synthesis-oriented
AI Analysis

This addresses media bias and misinformation for Arabic-language readers, but is incremental as it applies existing methods to a new language.

The study tackled subjectivity detection in Arabic news by creating the first large dataset of ~3.6K manually annotated sentences and benchmarking models, finding that LLMs with in-context learning performed better.

Detecting subjectivity in news sentences is crucial for identifying media bias, enhancing credibility, and combating misinformation by flagging opinion-based content. It provides insights into public sentiment, empowers readers to make informed decisions, and encourages critical thinking. While research has developed methods and systems for this purpose, most efforts have focused on English and other high-resourced languages. In this study, we present the first large dataset for subjectivity detection in Arabic, consisting of ~3.6K manually annotated sentences, and GPT-4o based explanation. In addition, we included instructions (both in English and Arabic) to facilitate LLM based fine-tuning. We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results, including PLMs and LLMs. Our analysis of the annotation process highlights that annotators were strongly influenced by their political, cultural, and religious backgrounds, especially at the beginning of the annotation process. The experimental results suggest that LLMs with in-context learning provide better performance. We aim to release the dataset and resources for the community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes