Constraint 2021: Machine Learning Models for COVID-19 Fake News Detection Shared Task
This work addresses the problem of COVID-19 fake news detection for social media users and platforms, providing an incremental solution within a shared task.
This paper describes a system for classifying COVID-19 related social media posts as fake or real, achieving a weighted average F1 score of 95.19% on test data using a linear SVM with linguistic features. This places it in the middle of the leaderboard (80th out of 167 participants) for the Constraint 2021 shared task.
In this system paper we present our contribution to the Constraint 2021 COVID-19 Fake News Detection Shared Task, which poses the challenge of classifying COVID-19 related social media posts as either fake or real. In our system, we address this challenge by applying classical machine learning algorithms together with several linguistic features, such as n-grams, readability, emotional tone and punctuation. In terms of pre-processing, we experiment with various steps like stop word removal, stemming/lemmatization, link removal and more. We find our best performing system to be based on a linear SVM, which obtains a weighted average F1 score of 95.19% on test data, which lands a place in the middle of the leaderboard (place 80 of 167).