Uncovering Graph Reasoning in Decoder-only Transformers with Circuit Tracing
This provides a unified interpretability framework for researchers studying structural reasoning in AI models, but it is incremental as it builds on existing circuit-tracer methods.
The paper tackled the problem of understanding internal mechanisms in decoder-only transformers for graph reasoning tasks, identifying token merging and structural memorization as core mechanisms and analyzing their behavior with graph density and model size.
Transformer-based LLMs demonstrate strong performance on graph reasoning tasks, yet their internal mechanisms remain underexplored. To uncover these reasoning process mechanisms in a fundamental and unified view, we set the basic decoder-only transformers and explain them using the circuit-tracer framework. Through this lens, we visualize reasoning traces and identify two core mechanisms in graph reasoning: token merging and structural memorization, which underlie both path reasoning and substructure extraction tasks. We further quantify these behaviors and analyze how they are influenced by graph density and model size. Our study provides a unified interpretability framework for understanding structural reasoning in decoder-only Transformers.