A Benchmark Dataset of Check-worthy Factual Claims
This provides a benchmark dataset for researchers developing computational fact-checking methods, though it is incremental as it focuses on data collection rather than new algorithms.
The authors created the ClaimBuster dataset of 23,533 annotated statements from U.S. presidential debates to help identify fact-check-worthy claims, making it publicly available for research.
In this paper we present the ClaimBuster dataset of 23,533 statements extracted from all U.S. general election presidential debates and annotated by human coders. The ClaimBuster dataset can be leveraged in building computational methods to identify claims that are worth fact-checking from the myriad of sources of digital or traditional media. The ClaimBuster dataset is publicly available to the research community, and it can be found at http://doi.org/10.5281/zenodo.3609356.