Textual Analysis of Communications in COVID-19 Infected Community on Social Media
This work provides tools for pandemic-related social media research by enabling automated classification of posts, though it is incremental as it applies existing methods to new data.
The study analyzed linguistic characteristics of discussions on the COVID-19 subreddit r/COVID19positive, finding differences in psychological, emotional, and reasoning aspects across three topic categories, and used state-of-the-art pre-trained language models to classify posts into these categories.
During the COVID-19 pandemic, people started to discuss about pandemic-related topics on social media. On subreddit \textit{r/COVID19positive}, a number of topics are discussed or being shared, including experience of those who got a positive test result, stories of those who presumably got infected, and questions asked regarding the pandemic and the disease. In this study, we try to understand, from a linguistic perspective, the nature of discussions on the subreddit. We found differences in linguistic characteristics (e.g. psychological, emotional and reasoning) across three different categories of topics. We also classified posts into the different categories using SOTA pre-trained language models. Such classification model can be used for pandemic-related research on social media.