CiteCheck: Towards Accurate Citation Faithfulness Detection
This work addresses a domain-specific problem for Chinese retrieval-augmented generation systems by providing a challenging dataset, though it is incremental as it builds on existing detection tasks.
The authors tackled the scarcity of large-scale Chinese datasets for citation faithfulness detection by introducing CiteCheck, a dataset constructed via a cost-effective two-stage manual annotation method that balances positive and negative samples, enabling smaller models to achieve strong performance with parameter-efficient fine-tuning.
Citation faithfulness detection is critical for enhancing retrieval-augmented generation (RAG) systems, yet large-scale Chinese datasets for this task are scarce. Existing methods face prohibitive costs due to the need for manually annotated negative samples. To address this, we introduce the first large-scale Chinese dataset CiteCheck for citation faithfulness detection, constructed via a cost-effective approach using two-stage manual annotation. This method balances positive and negative samples while significantly reducing annotation expenses. CiteCheck comprises training and test splits. Experiments demonstrate that: (1) the test samples are highly challenging, with even state-of-the-art LLMs failing to achieve high accuracy; and (2) training data augmented with LLM-generated negative samples enables smaller models to attain strong performance using parameter-efficient fine-tuning. CiteCheck provides a robust foundation for advancing citation faithfulness detection in Chinese RAG systems. The dataset is publicly available to facilitate research.