CL IR LG MLSep 7, 2019

MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, Jakob Grue Simonsen

arXiv:1909.03242v231.31061 citations

Originality Incremental advance

AI Analysis

This provides a challenging testbed for fact-checking research, addressing the need for real-world, multi-domain datasets to improve automated verification systems.

The authors tackled the problem of automatic claim verification by introducing MultiFC, the largest publicly available dataset of factual claims collected from 26 fact-checking websites, with human-labeled veracity. Their best model achieved a Macro F1 of 49.2%, outperforming baselines through joint ranking of evidence and metadata encoding.

We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Further, we present results for automatic veracity prediction, both with established baselines and with a novel method for joint ranking of evidence pages and predicting veracity that outperforms all baselines. Significant performance increases are achieved by encoding evidence, and by modelling metadata. Our best-performing model achieves a Macro F1 of 49.2%, showing that this is a challenging testbed for claim veracity prediction.

View on arXiv PDF

Similar