CLNov 1, 2025

ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models

Jiani Guo, Zuchao Li, Jie Wu, Qianren Wang, Yun Li, Lefei Zhang, Hai Zhao, Yujiu Yang

arXiv:2511.00489v16.72 citationsh-index: 8Has CodeEMNLP

Originality Highly original

AI Analysis

This addresses the challenge of maintaining logical coherence in long-context reasoning for large language model users, representing a novel method for a known bottleneck rather than a foundational breakthrough.

The paper tackles the problem of performance degradation in large language models when reasoning over long contexts by proposing ToM, a Tree-oriented MapReduce framework that leverages hierarchical document structures. Experimental results on 70B+ LLMs show that ToM significantly outperforms existing divide-and-conquer and retrieval-augmented generation methods in logical coherence and long-context reasoning.

Large Language Models (LLMs), constrained by limited context windows, often face significant performance degradation when reasoning over long contexts. To address this, Retrieval-Augmented Generation (RAG) retrieves and reasons over chunks but frequently sacrifices logical coherence due to its reliance on similarity-based rankings. Similarly, divide-and-conquer frameworks (DCF) split documents into small chunks for independent reasoning and aggregation. While effective for local reasoning, DCF struggles to capture long-range dependencies and risks inducing conflicts by processing chunks in isolation. To overcome these limitations, we propose ToM, a novel Tree-oriented MapReduce framework for long-context reasoning. ToM leverages the inherent hierarchical structure of long documents (e.g., main headings and subheadings) by constructing a DocTree through hierarchical semantic parsing and performing bottom-up aggregation. Using a Tree MapReduce approach, ToM enables recursive reasoning: in the Map step, rationales are generated at child nodes; in the Reduce step, these rationales are aggregated across sibling nodes to resolve conflicts or reach consensus at parent nodes. Experimental results on 70B+ LLMs show that ToM significantly outperforms existing divide-and-conquer frameworks and retrieval-augmented generation methods, achieving better logical coherence and long-context reasoning. Our code is available at https://github.com/gjn12-31/ToM .

View on arXiv PDF Code

Similar