CLIRApr 26, 2022

CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets

arXiv:2204.12164v1597 citationsh-index: 32
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific dataset for fact-checking biomedical COVID-19 misinformation, which is incremental as it fills a gap in existing resources.

The authors tackled the problem of misinformation in biomedical COVID-19 tweets by creating CoVERT, a corpus of 300 fact-checked tweets with annotations for entities and relations, finding that real-world evidence is more useful than pretrained language models in fact-checking.

Over the course of the COVID-19 pandemic, large volumes of biomedical information concerning this new disease have been published on social media. Some of this information can pose a real danger to people's health, particularly when false information is shared, for instance recommendations on how to treat diseases without professional medical advice. Therefore, automatic fact-checking resources and systems developed specifically for the medical domain are crucial. While existing fact-checking resources cover COVID-19-related information in news or quantify the amount of misinformation in tweets, there is no dataset providing fact-checked COVID-19-related Twitter posts with detailed annotations for biomedical entities, relations and relevant evidence. We contribute CoVERT, a fact-checked corpus of tweets with a focus on the domain of biomedicine and COVID-19-related (mis)information. The corpus consists of 300 tweets, each annotated with medical named entities and relations. We employ a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online. This methodology results in moderate inter-annotator agreement. Furthermore, we use the retrieved evidence extracts as part of a fact-checking pipeline, finding that the real-world evidence is more useful than the knowledge indirectly available in pretrained language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes