In Search of Credible News
This addresses the proliferation of fake news in social media, especially for understudied non-English languages, by providing new datasets and a detection method.
The authors tackled the problem of detecting fake online news by collecting three new balanced datasets for non-English languages and proposing a language-independent model using linguistic, credibility-related, and semantic features. Their experiments showed the model achieves very high accuracy in distinguishing credible from fake news.
We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online sources. We then propose a language-independent approach for automatically distinguishing credible from fake news, based on a rich feature set. In particular, we use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), and semantic (embeddings and DBPedia data) features. Our experiments on three different testsets show that our model can distinguish credible from fake news with very high accuracy.