A Parametric Memory Head for Continual Generative Retrieval

Kidist Amde Mekonnen, Yubao Tang, Maarten de Rijke

arXiv:2604.2338860.11 citationsh-index: 3

AI Analysis

For practitioners deploying generative retrieval in dynamic environments, this work addresses the stability-plasticity trade-off with a memory-based approach that reduces forgetting without full retraining.

Generative retrieval models suffer catastrophic forgetting when adapted to dynamic document collections. The proposed PAMT method with a parametric memory head improves retention on earlier slices by up to 20% while maintaining performance on new documents, modifying only sparse memory values per session.

Generative information retrieval (GenIR) consolidates retrieval into a single neural model that decodes document identifiers (docids) directly from queries. While this model-as-index paradigm offers architectural simplicity, it is poorly suited to dynamic document collections. Unlike modular systems, where indexes are easily updated, GenIR's knowledge is parametrically encoded in its weights; consequently, standard adaptation methods such as full and parameter-efficient fine-tuning can induce catastrophic forgetting. We show that sequential adaptation improves retrieval on newly added documents but substantially degrades performance on earlier slices, exposing a pronounced stability-plasticity trade-off. To address this, we propose post-adaptation memory tuning (PAMT), a memory-only stabilization stage that augments an adapted model with a modular parametric memory head (PMH). PAMT freezes the backbone and attaches a product-key memory with fixed addressing. During prefix-trie constrained decoding, decoder hidden states sparsely query PMH to produce residual corrections in hidden space; these corrections are mapped to score adjustments via the frozen output embedding matrix, computed only over trie-valid tokens. This guides docid generation while keeping routing and backbone parameters fixed. To limit cross-slice interference, PAMT updates only a fixed budget of memory values selected using decoding-time access statistics, prioritizing entries frequently activated by the current slice and rarely used in prior sessions. Experiments on MS MARCO and Natural Questions under sequential, disjoint corpus increments show that PAMT substantially improves retention on earlier slices with minimal impact on retrieval performance for newly added documents, while modifying only a sparse subset of memory values per session.

View on arXiv PDF

Similar