LGMay 11

Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions

arXiv:2605.0996788.7
Predicted impact top 9% in LG · last 90 daysOriginality Synthesis-oriented
AI Analysis

For interpretability researchers, this provides evidence that linear directions in language models may arise from underlying compositional representations, but the finding is incremental as it is demonstrated only in a synthetic domain.

The paper shows that in a language model trained on Othello, linear board-state representations are projections of more structured tensor product representations (TPRs). TPR probes recover shared structure among linear probes, yielding factorized embeddings and a binding matrix, and linear probes can be directly recovered from TPR parameters.

While researchers are finding concepts represented as linear directions in language models, a bag of linear directions fails to capture relational structure. To better understand this dichotomy, we study a model with known linear representations, but trained in a highly structured domain -- the board game Othello. While the model's internal board-state representation is linearly decodable, we find additional structure in the form of tensor product representations (TPRs). We train TPR probes to recover shared structure amongst the linear probes, yielding a factorization into square-embeddings, color-embeddings, and a binding matrix that composes them to construct the model's board-state representation. We find geometric signatures within the weights of our TPR probe that align with the structure of the board, but perhaps more importantly, that the linear probes can be recovered directly from the parameters of our TPR probe. Our findings suggest that directional representations may be projections of more structured underlying representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes