CLJan 27, 2024

Do We Need Language-Specific Fact-Checking Models? The Case of Chinese

Caiqi Zhang, Zhijiang Guo, Andreas Vlachos

Cambridge

arXiv:2401.15498v316.429 citationsh-index: 26EMNLP

Originality Incremental advance

AI Analysis

This addresses the need for more accurate and robust fact-checking systems in non-English languages like Chinese, though it is incremental as it builds on existing datasets and methods.

The paper tackled the problem of fact-checking in Chinese by demonstrating limitations of translation-based methods and multilingual LLMs, and proposed a language-specific system that outperforms these baselines and shows improved robustness to biases on the CHEF and an adversarial dataset.

This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese. We first demonstrate the limitations of translation-based methods and multilingual large language models (e.g., GPT-4), highlighting the need for language-specific systems. We further propose a Chinese fact-checking system that can better retrieve evidence from a document by incorporating context information. To better analyze token-level biases in different systems, we construct an adversarial dataset based on the CHEF dataset, where each instance has large word overlap with the original one but holds the opposite veracity label. Experimental results on the CHEF dataset and our adversarial dataset show that our proposed method outperforms translation-based methods and multilingual LLMs and is more robust toward biases, while there is still large room for improvement, emphasizing the importance of language-specific fact-checking systems.

View on arXiv PDF

Similar