tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for Detecting Tweets Self-reporting a COVID-19 Diagnosis
This work addresses the challenge of identifying COVID-19 self-reports on social media for public health monitoring, but it is incremental as it focuses on comparing preprocessing techniques within a specific task.
The paper tackled the problem of automatically detecting tweets that self-report a COVID-19 diagnosis, achieving an F1-score of 84.5% with an ensemble of fine-tuned transformer models, which is 4.1% higher than the average.
The paper describes a system developed for Task 1 at SMM4H 2023. The goal of the task is to automatically distinguish tweets that self-report a COVID-19 diagnosis (for example, a positive test, clinical diagnosis, or hospitalization) from those that do not. We investigate the use of different techniques for preprocessing tweets using four transformer-based models. The ensemble of fine-tuned language models obtained an F1-score of 84.5%, which is 4.1% higher than the average value.