CLAILGFeb 23, 2024

MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

arXiv:2402.15268v182 citationsh-index: 6LREC
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient context tracking for users of large language models, though it is an incremental improvement over existing prompt-based methods.

The paper tackles the problem of context tracking in pre-trained language models by introducing MemoryPrompt, a lightweight auxiliary recurrent network that passes information to the LM without finetuning. The result shows that MemoryPrompt-augmented LMs outperform larger models with full input history on fact update tasks and match performance on long-distance dialogues while avoiding catastrophic forgetting.

Transformer-based language models (LMs) track contextual information through large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed to probe a LM's ability to keep track of multiple fact updates, a MemoryPrompt-augmented LM outperforms much larger LMs that have access to the full input history. We also test MemoryPrompt on a long-distance dialogue dataset, where its performance is comparable to that of a model conditioned on the entire conversation history. In both experiments we also observe that, unlike full-finetuning approaches, MemoryPrompt does not suffer from catastrophic forgetting when adapted to new tasks, thus not disrupting the generalist capabilities of the underlying LM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes