CLOct 15, 2025

MemoTime: Memory-Augmented Temporal Knowledge Graph Enhanced Large Language Model Reasoning

arXiv:2510.13614v113 citationsh-index: 5
Originality Highly original
AI Analysis

This work addresses temporal reasoning limitations in LLMs for applications requiring complex temporal understanding, representing a novel method rather than incremental progress.

The paper tackles the problem of temporal reasoning in Large Language Models (LLMs) by proposing MemoTime, a memory-augmented temporal knowledge graph framework that addresses challenges like multi-hop reasoning and temporal synchronization, achieving state-of-the-art results with up to 24.0% improvement over baselines and enabling smaller models to match GPT-4-Turbo performance.

Large Language Models (LLMs) have achieved impressive reasoning abilities, but struggle with temporal understanding, especially when questions involve multiple entities, compound operators, and evolving event sequences. Temporal Knowledge Graphs (TKGs), which capture vast amounts of temporal facts in a structured format, offer a reliable source for temporal reasoning. However, existing TKG-based LLM reasoning methods still struggle with four major challenges: maintaining temporal faithfulness in multi-hop reasoning, achieving multi-entity temporal synchronization, adapting retrieval to diverse temporal operators, and reusing prior reasoning experience for stability and efficiency. To address these issues, we propose MemoTime, a memory-augmented temporal knowledge graph framework that enhances LLM reasoning through structured grounding, recursive reasoning, and continual experience learning. MemoTime decomposes complex temporal questions into a hierarchical Tree of Time, enabling operator-aware reasoning that enforces monotonic timestamps and co-constrains multiple entities under unified temporal bounds. A dynamic evidence retrieval layer adaptively selects operator-specific retrieval strategies, while a self-evolving experience memory stores verified reasoning traces, toolkit decisions, and sub-question embeddings for cross-type reuse. Comprehensive experiments on multiple temporal QA benchmarks show that MemoTime achieves overall state-of-the-art results, outperforming the strong baseline by up to 24.0%. Furthermore, MemoTime enables smaller models (e.g., Qwen3-4B) to achieve reasoning performance comparable to that of GPT-4-Turbo.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes