CL IRAug 17, 2025

A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation

Ziyang Chen, Erxue Min, Xiang Zhao, Yunxin Li, Xin Jia, Jinzhi Liao, Jichao Li, Shuaiqiang Wang, Baotian Hu, Dawei Yin

arXiv:2508.12282v16.72 citationsh-index: 17Sci Data

Originality Synthesis-oriented

AI Analysis

This provides a benchmark for advancing time-sensitive RAG systems, addressing a domain-specific need in natural language processing.

The authors tackled the problem of evaluating temporal reasoning in Retrieval-Augmented Generation (RAG) systems by introducing ChronoQA, a large-scale Chinese question answering dataset with 5,176 high-quality questions derived from over 300,000 news articles from 2019 to 2024, enabling structured evaluation across various temporal tasks.

We introduce ChronoQA, a large-scale benchmark dataset for Chinese question answering, specifically designed to evaluate temporal reasoning in Retrieval-Augmented Generation (RAG) systems. ChronoQA is constructed from over 300,000 news articles published between 2019 and 2024, and contains 5,176 high-quality questions covering absolute, aggregate, and relative temporal types with both explicit and implicit time expressions. The dataset supports both single- and multi-document scenarios, reflecting the real-world requirements for temporal alignment and logical consistency. ChronoQA features comprehensive structural annotations and has undergone multi-stage validation, including rule-based, LLM-based, and human evaluation, to ensure data quality. By providing a dynamic, reliable, and scalable resource, ChronoQA enables structured evaluation across a wide range of temporal tasks, and serves as a robust benchmark for advancing time-sensitive retrieval-augmented question answering systems.

View on arXiv PDF

Similar