TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
This addresses the challenge of temporal reasoning in LLMs for applications like science and decision-making, but it is incremental as it focuses on dataset creation and baseline improvements.
The authors tackled the problem of large language models' limited ability to reason over time by creating the TRANSIENTTABLES dataset with 3,971 questions from over 14,000 tables across 1,238 entities, and they introduced modeling strategies that improved LLM performance.
Humans continuously make new discoveries, and understanding temporal sequence of events leading to these breakthroughs is essential for advancing science and society. This ability to reason over time allows us to identify future steps and understand the effects of financial and political decisions on our lives. However, large language models (LLMs) are typically trained on static datasets, limiting their ability to perform effective temporal reasoning. To assess the temporal reasoning capabilities of LLMs, we present the TRANSIENTTABLES dataset, which comprises 3,971 questions derived from over 14,000 tables, spanning 1,238 entities across multiple time periods. We introduce a template-based question-generation pipeline that harnesses LLMs to refine both templates and questions. Additionally, we establish baseline results using state-of-the-art LLMs to create a benchmark. We also introduce novel modeling strategies centered around task decomposition, enhancing LLM performance.