KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?
This work provides a tool for analyzing reasoning mechanisms in LLMs, which is incremental as it builds on existing chain-of-thought methods to offer new insights into model behavior.
The authors tackled the problem of understanding how chain-of-thought reasoning improves LLM performance by introducing Causal CoT Graphs (CCGs) to model causal dependencies in reasoning traces, compiling a dataset of 1,671 mathematical problems and showing that LLMs emphasize these graph structures, indicating internal realization of such dependencies.
Chain-of-thought traces have been shown to improve performance of large language models in a plethora of reasoning tasks, yet there is no consensus on the mechanism through which this performance boost is achieved. To shed more light on this, we introduce Causal CoT Graphs (CCGs), which are directed acyclic graphs automatically extracted from reasoning traces that model fine-grained causal dependencies in the language model output. A collection of $1671$ mathematical reasoning problems from MATH500, GSM8K and AIME, and their associated CCGs are compiled into our dataset -- \textbf{KisMATH}. Our detailed empirical analysis with 15 open-weight LLMs shows that (i) reasoning nodes in the CCG are mediators for the final answer, a condition necessary for reasoning; and (ii) LLMs emphasise reasoning paths given by the CCG, indicating that models internally realise structures akin to our graphs. KisMATH enables controlled, graph-aligned interventions and opens up avenues for further investigation into the role of chain-of-thought in LLM reasoning.