LGJul 19, 2024

Investigating the Indirect Object Identification circuit in Mamba

arXiv:2407.14008v22 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This work provides initial evidence that mechanistic interpretability tools can generalize to new architectures like Mamba, addressing a problem for researchers in AI interpretability.

The authors adapted existing interpretability techniques to the Mamba architecture and partially reverse-engineered the circuit for the Indirect Object Identification task, identifying key components like Layer 39 as a bottleneck and linear storage of name entities.

How well will current interpretability techniques generalize to future models? A relevant case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba and partially reverse-engineer the circuit responsible for the Indirect Object Identification (IOI) task. Our techniques provide evidence that 1) Layer 39 is a key bottleneck, 2) Convolutions in layer 39 shift names one position forward, and 3) The name entities are stored linearly in Layer 39's SSM. Finally, we adapt an automatic circuit discovery tool, positional Edge Attribution Patching, to identify a Mamba IOI circuit. Our contributions provide initial evidence that circuit-based mechanistic interpretability tools work well for the Mamba architecture.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes