CL LG SIOct 14, 2020

No Rumours Please! A Multi-Indic-Lingual Approach for COVID Fake-Tweet Detection

Debanjana Kar, Mohit Bhardwaj, Suranjana Samanta, Amar Prakash Azad

arXiv:2010.06906v13.076 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses misinformation in low-resource Indic languages during the COVID-19 pandemic, though it is incremental as it adapts existing BERT methods to new languages and data.

The paper tackles fake news detection about COVID-19 in multiple Indic languages from social media tweets, achieving around 89% F-Score overall and establishing the first benchmarks for Hindi and Bengali with 79% and 81% F-Score respectively using annotated data, and 81% and 78% F-Score with zero-shot learning.

The sudden widespread menace created by the present global pandemic COVID-19 has had an unprecedented effect on our lives. Man-kind is going through humongous fear and dependence on social media like never before. Fear inevitably leads to panic, speculations, and the spread of misinformation. Many governments have taken measures to curb the spread of such misinformation for public well being. Besides global measures, to have effective outreach, systems for demographically local languages have an important role to play in this effort. Towards this, we propose an approach to detect fake news about COVID-19 early on from social media, such as tweets, for multiple Indic-Languages besides English. In addition, we also create an annotated dataset of Hindi and Bengali tweet for fake news detection. We propose a BERT based model augmented with additional relevant features extracted from Twitter to identify fake tweets. To expand our approach to multiple Indic languages, we resort to mBERT based model which is fine-tuned over created dataset in Hindi and Bengali. We also propose a zero-shot learning approach to alleviate the data scarcity issue for such low resource languages. Through rigorous experiments, we show that our approach reaches around 89% F-Score in fake tweet detection which supercedes the state-of-the-art (SOTA) results. Moreover, we establish the first benchmark for two Indic-Languages, Hindi and Bengali. Using our annotated data, our model achieves about 79% F-Score in Hindi and 81% F-Score for Bengali Tweets. Our zero-shot model achieves about 81% F-Score in Hindi and 78% F-Score for Bengali Tweets without any annotated data, which clearly indicates the efficacy of our approach.

View on arXiv PDF Code

Similar