CLJun 6, 2022

CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking

Xuming Hu, Zhijiang Guo, Guanyu Wu, Aiwei Liu, Lijie Wen, Philip S. Yu

Tsinghua

arXiv:2206.11863v1646 citationsh-index: 43Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of misinformation in Chinese for researchers and developers, but it is incremental as it extends existing fact-checking methods to a new language.

The authors tackled the lack of Chinese resources for automated fact-checking by constructing CHEF, a dataset of 10K real-world claims with annotated evidence, and developed a novel approach that models evidence retrieval as a latent variable, showing it provides a challenging testbed for non-English systems.

The explosion of misinformation spreading in the media ecosystem urges for automated fact-checking. While misinformation spans both geographic and linguistic boundaries, most work in the field has focused on English. Datasets and tools available in other languages, such as Chinese, are limited. In order to bridge this gap, we construct CHEF, the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims. The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet. Further, we develop established baselines and a novel approach that is able to model the evidence retrieval as a latent variable, allowing jointly training with the veracity prediction model in an end-to-end fashion. Extensive experiments show that CHEF will provide a challenging testbed for the development of fact-checking systems designed to retrieve and reason over non-English claims.

View on arXiv PDF Code

Similar