AIJul 23, 2025

Improving LLMs' Generalized Reasoning Abilities by Graph Problems

Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li

arXiv:2507.17168v17.8

Originality Incremental advance

AI Analysis

This addresses the need for more adaptable and robust LLMs by bridging domain-specific pretraining with universal reasoning capabilities, though it appears incremental as it builds on existing CPT methods.

The researchers tackled the problem of LLMs' poor performance on novel and complex reasoning tasks by using Graph Problem Reasoning (GPR) for continued pretraining, achieving up to 4.9% higher accuracy in mathematical reasoning and up to 21.2% improvement in non-mathematical reasoning tasks.

Large Language Models (LLMs) have made remarkable strides in reasoning tasks, yet their performance often falters on novel and complex problems. Domain-specific continued pretraining (CPT) methods, such as those tailored for mathematical reasoning, have shown promise but lack transferability to broader reasoning tasks. In this work, we pioneer the use of Graph Problem Reasoning (GPR) to enhance the general reasoning capabilities of LLMs. GPR tasks, spanning pathfinding, network analysis, numerical computation, and topological reasoning, require sophisticated logical and relational reasoning, making them ideal for teaching diverse reasoning patterns. To achieve this, we introduce GraphPile, the first large-scale corpus specifically designed for CPT using GPR data. Spanning 10.9 billion tokens across 23 graph tasks, the dataset includes chain-of-thought, program-of-thought, trace of execution, and real-world graph data. Using GraphPile, we train GraphMind on popular base models Llama 3 and 3.1, as well as Gemma 2, achieving up to 4.9 percent higher accuracy in mathematical reasoning and up to 21.2 percent improvement in non-mathematical reasoning tasks such as logical and commonsense reasoning. By being the first to harness GPR for enhancing reasoning patterns and introducing the first dataset of its kind, our work bridges the gap between domain-specific pretraining and universal reasoning capabilities, advancing the adaptability and robustness of LLMs.

View on arXiv PDF

Similar