LGJun 25, 2025

Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers

arXiv:2506.20746v21 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding knowledge retrieval mechanisms in LLMs for researchers in interpretability and model design, though it is incremental as it builds on existing localization techniques.

The paper investigates how LLMs store and retrieve new facts learned during finetuning, identifying two separate pathways: 'enriching' the residual stream with relation information during entity processing and 'recalling' it at the final token before generation, with models sometimes requiring both pathways for correct fact generation.

When an LLM learns a new fact during finetuning (e.g., new movie releases, newly elected pope, etc.), where does this information go? Are entities enriched with relation information, or do models recall information just-in-time before a prediction? Or, are ``all of the above'' true with LLMs implementing multiple redundant heuristics? Existing localization approaches (e.g., activation patching) are ill-suited for this analysis because they usually \textit{replace} parts of the residual stream, thus overriding previous information. To fill this gap, we propose \emph{dynamic weight grafting}, a technique that selectively grafts weights from a finetuned model onto a pretrained model. Using this technique, we show two separate pathways for retrieving finetuned relation information: 1) ``enriching" the residual stream with relation information while processing the tokens that correspond to an entity (e.g., ``Zendaya'' in ``Zendaya co-starred with John David Washington'') and 2) ``recalling" this information at the final token position before generating a target fact. In some cases, models need information from both of these pathways to correctly generate finetuned facts while, in other cases, either the ``enrichment" or ``recall" pathway alone is sufficient. We localize the ``recall'' pathway to model components -- finding that ``recall" occurs via both task-specific attention mechanisms and an entity-specific extraction step in the feedforward networks of the final layers before the target prediction. By targeting model components and parameters, as opposed to just activations, we are able to understand the \textit{mechanisms} by which finetuned knowledge is retrieved during generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes