IRCLLGJul 19, 2023

IncDSI: Incrementally Updatable Document Retrieval

CMU
arXiv:2307.10323v219 citationsh-index: 80Has Code
Originality Incremental advance
AI Analysis

This addresses a significant limitation for real-time document retrieval systems by enabling incremental updates, though it is incremental as it builds on the existing DSI paradigm.

The paper tackles the problem of updating Differentiable Search Index (DSI) models with new documents without retraining, proposing IncDSI, which adds documents in real-time (20-50ms per document) and achieves competitive performance with full retraining.

Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes