CLSep 9, 2024

Representational Analysis of Binding in Language Models

arXiv:2409.05448v339 citationsh-index: 13
Originality Incremental advance
AI Analysis

This provides a mechanistic explanation of binding in language models, which is incremental but clarifies a known bottleneck for AI researchers.

The paper tackled the problem of understanding how language models bind entities to attributes for in-context entity tracking by localizing the Ordering ID (OI) in activations and proving its causal effect on binding behavior, enabling manipulation to change bindings (e.g., making 'Box Z' bind to 'stone' instead of 'coffee').

Entity tracking is essential for complex reasoning. To perform in-context entity tracking, language models (LMs) must bind an entity to its attribute (e.g., bind a container to its content) to recall attribute for a given entity. For example, given a context mentioning ``The coffee is in Box Z, the stone is in Box M, the map is in Box H'', to infer ``Box Z contains the coffee'' later, LMs must bind ``Box Z'' to ``coffee''. To explain the binding behaviour of LMs, existing research introduces a Binding ID mechanism and states that LMs use a abstract concept called Binding ID (BI) to internally mark entity-attribute pairs. However, they have not captured the Ordering ID (OI) from entity activations that directly determines the binding behaviour. In this work, we provide a novel view of the BI mechanism by localizing OI and proving the causality between OI and binding behaviour. Specifically, by leveraging dimension reduction methods (e.g., PCA), we discover that there exists a low-rank subspace in the activations of LMs, that primarily encodes the order (i.e., OI) of entity and attribute. Moreover, we also discover the causal effect of OI on binding that when editing representations along the OI encoding direction, LMs tend to bind a given entity to other attributes accordingly. For example, by patching activations along the OI encoding direction we can make the LM to infer ``Box Z contains the stone'' and ``Box Z contains the map''.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes