Integrating Stance Detection and Fact Checking in a Unified Corpus
This addresses the problem of fragmented datasets for fact-checking systems, particularly for Arabic language applications, though it is incremental as it builds on existing tasks.
The paper tackles the lack of datasets supporting integrated fact-checking tasks by creating a unified corpus with interdependent annotations for document retrieval, stance detection, source credibility, and rationale extraction, implemented on an Arabic fact-checking corpus, which is the first of its kind.
A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e.g., news websites, social media, etc.), determining the stance of each document with respect to the claim, and finally making a prediction about the claim's factuality by aggregating the strength of the stances, while taking the reliability of the source into account. Moreover, a fact checking system should be able to explain its decision by providing relevant extracts (rationales) from the documents. Yet, this setup is not directly supported by existing datasets, which treat fact checking, document retrieval, source credibility, stance detection and rationale extraction as independent tasks. In this paper, we support the interdependencies between these tasks as annotations in the same corpus. We implement this setup on an Arabic fact checking corpus, the first of its kind.