CLJan 27, 2023

Predicting Sentence-Level Factuality of News and Bias of Media Outlets

Francielle Vargas, Kokil Jaidka, Thiago A. S. Pardo, Fabrício Benevenuto

arXiv:2301.11850v421.4136 citationsh-index: 52Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for scalable fact-checking and bias detection in news, particularly for Brazilian Portuguese, but is incremental as it applies existing methods to a new dataset.

The paper tackles the problem of automated news credibility by introducing FactNews, a large sentence-level dataset of 6,191 sentences annotated for factuality and media bias in Brazilian Portuguese, and shows that biased sentences tend to be longer and more emotional, enabling promising predictions of media outlet reliability.

Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles provided promising results for predicting the reliability of media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.

View on arXiv PDF Code

Similar