LGAICLMay 12

WriteSAE: Sparse Autoencoders for Recurrent State

arXiv:2605.1277081.2
Predicted impact top 12% in LG · last 90 daysOriginality Highly original
AI Analysis

For researchers working on mechanistic interpretability and model editing in recurrent and hybrid language models, this paper solves the problem of decomposing and editing the matrix cache write, which was previously inaccessible to residual SAEs.

WriteSAE is the first sparse autoencoder that decomposes and edits the matrix cache write of recurrent state-space and hybrid language models, where residual SAEs cannot. It achieves 92.4% atom substitution success on Qwen3.5-0.8B and 88.1% on Mamba-2-370M, with a closed-form logit shift prediction at R²=0.98, and enables the first behavioral install at the matrix-recurrent write site, lifting target-in-continuation from 33.3% to 100% under greedy decoding.

We introduce WriteSAE, the first sparse autoencoder that decomposes and edits the matrix cache write of state-space and hybrid recurrent language models, where residual SAEs cannot reach. Existing SAEs read residual streams, but Gated DeltaNet, Mamba-2, and RWKV-7 write to a $d_k \times d_v$ cache through rank-1 updates $k_t v_t^\top$ that no vector atom can replace. WriteSAE factors each decoder atom into the native write shape, exposes a closed form for the per-token logit shift, and trains under matched Frobenius norm so atoms swap one cache slot at a time. Atom substitution beats matched-norm ablation on 92.4% of $n=4{,}851$ firings at Qwen3.5-0.8B L9 H4, the 87-atom population test holds at 89.8%, the closed form predicts measured effects at $R^2=0.98$, and Mamba-2-370M substitutes at 88.1% over 2,500 firings. Sustained three-position installs at $3\times$ lift midrank target-in-continuation from 33.3% to 100% under greedy decoding, the first behavioral install at the matrix-recurrent write site.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes