GraphGhost: Tracing Structures Behind Large Language Models
This work addresses the need for interpretability in AI by providing a tool to analyze and intervene in the structural foundations of reasoning in large language models, which is incremental in advancing model understanding.
The paper tackles the problem of understanding the structural mechanisms behind large language models' reasoning capabilities by introducing GraphGhost, a framework that represents neuron activations as graphs, enabling analysis and interventions that reveal key neuron nodes and their impact on reasoning.
Large Language Models (LLMs) demonstrate remarkable reasoning capabilities, yet the structural mechanisms underlying these abilities remain under explored. In this work, we introduce GraphGhost, a unified framework that represents neuron activations and their signal propagation as graphs, explaining how LLMs capture structural semantics from sequential inputs and generate outputs through structurally consistent mechanisms. This graph-based perspective enables us to employ graph algorithms such as PageRank to characterize the properties of LLMs, revealing both shared and model-specific reasoning behaviors across diverse datasets. We further identify the activated neurons within GraphGhost and evaluate them through structural interventions, showing that edits to key neuron nodes can trigger reasoning collapse, altering both logical flow and semantic understanding. Together, these contributions position GraphGhost as a powerful tool for analyzing, intervening in, and ultimately understanding the structural foundations of reasoning in LLMs.