UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu
This work addresses fake news identification for Urdu speakers, but it is incremental as it applies existing methods to a new language dataset.
The paper tackled the problem of fake news detection in Urdu by organizing a shared task with a dataset of 900 training and 400 testing articles across five domains, where the best system achieved an F-score of 0.90 using a BERT-based approach.
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. This is a binary classification task in which the goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business. 42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task. 9 teams submitted their experimental results. The participants used various machine learning methods ranging from feature-based traditional machine learning to neural network techniques. The best performing system achieved an F-score value of 0.90, showing that the BERT-based approach outperforms other machine learning classifiers.