Zero-shot Fact Verification by Claim Generation
This addresses the problem of expensive dataset creation for fact verification models, particularly for new domains, though it is incremental as it builds on existing neural models and datasets.
The paper tackles the high cost of creating human-annotated datasets for fact verification in new domains by developing QACG, a framework that automatically generates claims from evidence, reducing the need for manual data; experiments on FEVER show it improves a RoBERTa model's F1 from 50% to 77% in zero-shot scenarios, matching performance with 2K+ manually-curated examples.
Neural models for automated fact verification have achieved promising results thanks to the availability of large, human-annotated datasets. However, for each new domain that requires fact verification, creating a dataset by manually writing claims and linking them to their supporting evidence is expensive. We develop QACG, a framework for training a robust fact verification model by using automatically generated claims that can be supported, refuted, or unverifiable from evidence from Wikipedia. QACG generates question-answer pairs from the evidence and then converts them into different types of claims. Experiments on the FEVER dataset show that our QACG framework significantly reduces the demand for human-annotated training data. In a zero-shot scenario, QACG improves a RoBERTa model's F1 from 50% to 77%, equivalent in performance to 2K+ manually-curated examples. Our QACG code is publicly available.