ICICLE: Expanding Retrieval with In-Context Documents
For practitioners of generative retrieval, ICICLE offers a practical solution to incrementally expand the corpus without retraining, though it is an incremental improvement over existing methods.
ICICLE addresses the problem of costly corpus expansion in generative retrieval by treating new documents as in-context evidence at inference time, avoiding retraining. It achieves improved retrieval of new documents while maintaining performance on existing ones on MS MARCO and NQ320K.
Generative retrieval (GR) maps queries directly to document identifiers (docids) using parametric knowledge, However, this design makes corpus expansion costly: adding new documents requires updating model parameters to encode new document-docid associations incurs repeated training and catastrophic forgetting of previously indexed documents. In this work, we revisit incremental GR as an in-context retrieval problem, where newly added documents are supplied as inference-time document-docid evidence. We propose ICICLE, an in-context indexing framework that performs source-aware docid generation over both parametric memory and context-provided document-docid pairs. ICICLE combines a `[COPY]`-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval. Experiments on MS MARCO and NQ320K show that ICICLE improves retrieval of newly introduced documents while preserving seen-document retention without corpus-specific retraining. Our analysis further shows that high-shot degradation is mainly caused by routing failure, highlighting source-selection calibration as a key bottleneck for scaling in-context generative retrieval.