CLJul 19, 2025

Linear Relational Decoding of Morphology in Language Models

arXiv:2507.14640v111 citationsh-index: 10NAACL

Originality Incremental advance

AI Analysis

This work provides insights into the interpretability of language models for morphology, which is incremental as it builds on existing linear approximation techniques.

The paper tackled the problem of interpreting conceptual relationships in language models by showing that a linear transformation can accurately reproduce final object states for morphological relations, achieving 90% faithfulness. It demonstrated similar results across multiple languages and models.

A two-part affine approximation has been found to be a good approximation for transformer computations over certain subject object relations. Adapting the Bigger Analogy Test Set, we show that the linear transformation Ws, where s is a middle layer representation of a subject token and W is derived from model derivatives, is also able to accurately reproduce final object states for many relations. This linear technique is able to achieve 90% faithfulness on morphological relations, and we show similar findings multi-lingually and across models. Our findings indicate that some conceptual relationships in language models, such as morphology, are readily interpretable from latent space, and are sparsely encoded by cross-layer linear transformations.

View on arXiv PDF

Similar