CLMar 14, 2018

FEVER: a large-scale dataset for Fact Extraction and VERification

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

arXiv:1803.05355v339.52257 citationsHas Code

Originality Incremental advance

AI Analysis

This dataset addresses the problem of automated fact-checking for researchers and developers by providing a benchmark to test and improve verification systems against textual sources.

The authors introduced FEVER, a large-scale dataset of 185,445 claims for fact verification against Wikipedia, with annotators achieving 0.6841 Fleiss κ for classification into Supported, Refuted, or NotEnoughInfo categories. They developed a pipeline approach that achieved 31.87% accuracy with correct evidence and 50.91% without, highlighting the dataset's challenge for advancing claim verification methods.

In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss $κ$. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.

View on arXiv PDF Code

Similar