Automated clustering of COVID-19 anti-vaccine discourse on Twitter
This work addresses the challenge of monitoring and mitigating vaccine disinformation online to support public health initiatives, though it is incremental as it builds on prior research.
The study tackled the problem of identifying anti-vaccination discourse on Twitter by analyzing 1.3 million tweets and 18 million retweets from December 2019 to June 2020, resulting in the development of text classifiers to detect such language with potential for early-warning mechanisms.
Attitudes about vaccination have become more polarized; it is common to see vaccine disinformation and fringe conspiracy theories online. An observational study of Twitter vaccine discourse is found in Ojea Quintana et al. (2021): the authors analyzed approximately six months' of Twitter discourse -- 1.3 million original tweets and 18 million retweets between December 2019 and June 2020, ranging from before to after the establishment of Covid-19 as a pandemic. This work expands upon Ojea Quintana et al. (2021) with two main contributions from data science. First, based on the authors' initial network clustering and qualitative analysis techniques, we are able to clearly demarcate and visualize the language patterns used in discourse by Antivaxxers (anti-vaccination campaigners and vaccine deniers) versus other clusters (collectively, Others). Second, using the characteristics of Antivaxxers' tweets, we develop text classifiers to determine the likelihood a given user is employing anti-vaccination language, ultimately contributing to an early-warning mechanism to improve the health of our epistemic environment and bolster (and not hinder) public health initiatives.