Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification
This addresses a specific challenge in automated fact-checking by improving robustness to distracting entities in false claims, but it is incremental as it builds on existing BERT-based methods.
The paper tackled the problem of evidence retrieval for false claims containing irrelevant entities, which distract retrieval models, and found that augmenting training data with synthetic false claims and using model ensembles increased evidence recall for such claims.
Accurate evidence retrieval is essential for automated fact checking. Little previous research has focused on the differences between true and false claims and how they affect evidence retrieval. This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model. A BERT-based retrieval model made more mistakes in retrieving refuting evidence for false claims than supporting evidence for true claims. When tested with adversarial false claims (synthetically generated) containing irrelevant entities, the recall of the retrieval model is significantly lower than that for original claims. These results suggest that the vanilla BERT-based retrieval model is not robust to irrelevant entities in the false claims. By augmenting the training data with synthetic false claims containing irrelevant entities, the trained model achieved higher evidence recall, including that of false claims with irrelevant entities. In addition, using separate models to retrieve refuting and supporting evidence and then aggregating them can also increase the evidence recall, including that of false claims with irrelevant entities. These results suggest that we can increase the BERT-based retrieval model's robustness to false claims with irrelevant entities via data augmentation and model ensemble.