CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines
This provides a resource for researchers analyzing vaccine hesitancy, but it is incremental as it builds on existing social media analysis by adding specific labels and explanations.
The authors tackled the problem of understanding specific anti-vaccine concerns on social media by curating CAVES, a dataset of about 10k COVID-19 anti-vaccine tweets labeled with multi-label concerns and explanations, and showed it is challenging with moderate scores from state-of-the-art models.
Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models. Our dataset and codes are available at: https://github.com/sohampoddar26/caves-data