On the Temporal Question-Answering Capabilities of Large Language Models Over Anonymized Data
This addresses the problem of temporal reasoning in LLMs for applications requiring privacy-preserving data handling, but it is incremental as it builds on existing techniques.
The paper investigated how well large language models (LLMs) perform on temporal reasoning tasks using anonymized data not seen during training, finding that standalone LLMs are insufficient for scalable and reliable solutions, with integrated approaches like Tree-of-Thought and code execution showing better performance on their custom RATA dataset.
The applicability of Large Language Models (LLMs) in temporal reasoning tasks over data that is not present during training is still a field that remains to be explored. In this paper we work on this topic, focusing on structured and semi-structured anonymized data. We not only develop a direct LLM pipeline, but also compare various methodologies and conduct an in-depth analysis. We identified and examined seventeen common temporal reasoning tasks in natural language, focusing on their algorithmic components. To assess LLM performance, we created the \textit{Reasoning and Answering Temporal Ability} dataset (RATA), featuring semi-structured anonymized data to ensure reliance on reasoning rather than on prior knowledge. We compared several methodologies, involving SoTA techniques such as Tree-of-Thought, self-reflexion and code execution, tuned specifically for this scenario. Our results suggest that achieving scalable and reliable solutions requires more than just standalone LLMs, highlighting the need for integrated approaches.