CLJun 7, 2024

ComplexTempQA:A 100m Dataset for Complex Temporal Question Answering

Raphael Gruber, Abdelrahman Abdallah, Michael Färber, Adam Jatowt

arXiv:2406.04866v35.56 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a large-scale resource for evaluating temporal reasoning in AI, addressing a domain-specific need for more complex temporal QA tasks.

They tackled the challenge of temporal question answering by introducing ComplexTempQA, a dataset with over 100 million question-answer pairs that surpasses existing benchmarks in scale and complexity, covering questions spanning over two decades.

We introduce \textsc{ComplexTempQA},\footnote{Dataset and code available at: https://github.com/DataScienceUIBK/ComplexTempQA} a large-scale dataset consisting of over 100 million question-answer pairs designed to tackle the challenges in temporal question answering. \textsc{ComplexTempQA} significantly surpasses existing benchmarks in scale and scope. Utilizing Wikipedia and Wikidata, the dataset covers questions spanning over two decades and offers an unmatched scale. We introduce a new taxonomy that categorizes questions as \textit{attributes}, \textit{comparisons}, and \textit{counting} questions, revolving around events, entities, and time periods, respectively. A standout feature of \textsc{ComplexTempQA} is the high complexity of its questions, which demand reasoning capabilities for answering such as across-time comparison, temporal aggregation, and multi-hop reasoning involving temporal event ordering and entity recognition. Additionally, each question is accompanied by detailed metadata, including specific time scopes, allowing for comprehensive evaluation of temporal reasoning abilities of large language models.

View on arXiv PDF Code

Similar