Harnessing Abstractive Summarization for Fact-Checked Claim Detection
This work addresses misinformation spread on social media by partially automating fact-checking, though it is incremental as it builds on existing summarization and retrieval methods.
The paper tackles the problem of detecting previously fact-checked claims on social media by proposing a workflow that uses abstractive summarization to generate queries, improving retrieval performance with Recall@5 and MRR increasing from 10% and 0.1 to 35% and 0.3.
Social media platforms have become new battlegrounds for anti-social elements, with misinformation being the weapon of choice. Fact-checking organizations try to debunk as many claims as possible while staying true to their journalistic processes but cannot cope with its rapid dissemination. We believe that the solution lies in partial automation of the fact-checking life cycle, saving human time for tasks which require high cognition. We propose a new workflow for efficiently detecting previously fact-checked claims that uses abstractive summarization to generate crisp queries. These queries can then be executed on a general-purpose retrieval system associated with a collection of previously fact-checked claims. We curate an abstractive text summarization dataset comprising noisy claims from Twitter and their gold summaries. It is shown that retrieval performance improves 2x by using popular out-of-the-box summarization models and 3x by fine-tuning them on the accompanying dataset compared to verbatim querying. Our approach achieves Recall@5 and MRR of 35% and 0.3, compared to baseline values of 10% and 0.1, respectively. Our dataset, code, and models are available publicly: https://github.com/varadhbhatnagar/FC-Claim-Det/