StructLens: A Structural Lens for Language Models via Maximum Spanning Trees
For interpretability researchers, this provides a new tool to understand how language models organize representations, though the findings are incremental and domain-specific.
StructLens introduces a framework using maximum spanning trees to analyze the structural organization of token representations in language models, revealing that middle layers exhibit the strongest local-span organization and that smaller units become detectable earlier in pre-training.
Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest their own internal structures as well. While interpretability research has investigated how models compute representations mechanistically through attention patterns and Sparse AutoEncoders, the organization of the resulting representations is overlooked. To address this gap, we introduce StructLens, a framework to analyze representations through a holistic structural view. StructLens constructs maximum spanning trees based on the semantic representations in residual streams, inspired by tree representation in dependency parsing, and provides summaries of token relationships in representation space. We analyze how contiguous tokens are also nearby in representation space and find that middle layers show the strongest local-span organization. Moreover, analysis of pre-training checkpoints reveals that smaller local units become detectable earlier in pre-training, and larger units later. Our findings demonstrate that StructLens provides insights into how models organize token representations across layers and training. Our code is available at https://github.com/naist-nlp/structlens.