CLMay 20, 2025

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM

Zhen Xiong, Yujun Cai, Zhecheng Li, Yiwei Wang

arXiv:2505.13890v118.817 citationsh-index: 19EMNLP

Originality Incremental advance

AI Analysis

This work addresses the challenge of unstable behaviors in Reasoning LLMs for researchers and practitioners, providing a quantitative tool for evaluating reasoning quality and improving prompt engineering, though it is incremental as it builds on existing CoT methods.

The paper tackled the problem of understanding and analyzing the reasoning processes of Large Language Models (LLMs) by introducing a graph-based framework that clusters Chain-of-Thought outputs into steps and constructs reasoning graphs, revealing that structural properties like exploration density and branching ratios strongly correlate with reasoning accuracy.

Recent advances in test-time scaling have enabled Large Language Models (LLMs) to display sophisticated reasoning abilities via extended Chain-of-Thought (CoT) generation. Despite their potential, these Reasoning LLMs (RLMs) often demonstrate counterintuitive and unstable behaviors, such as performance degradation under few-shot prompting, that challenge our current understanding of RLMs. In this work, we introduce a unified graph-based analytical framework for better modeling the reasoning processes of RLMs. Our method first clusters long, verbose CoT outputs into semantically coherent reasoning steps, then constructs directed reasoning graphs to capture contextual and logical dependencies among these steps. Through comprehensive analysis across models and prompting regimes, we reveal that structural properties, such as exploration density, branching, and convergence ratios, strongly correlate with reasoning accuracy. Our findings demonstrate how prompting strategies substantially reshape the internal reasoning structure of RLMs, directly affecting task outcomes. The proposed framework not only enables quantitative evaluation of reasoning quality beyond conventional metrics but also provides practical insights for prompt engineering and the cognitive analysis of LLMs. Code and resources will be released to facilitate future research in this direction.

View on arXiv PDF

Similar