IRAICLLGAug 4, 2024

Generative Retrieval with Few-shot Indexing

arXiv:2408.02152v37 citationsh-index: 18
AI Analysis

This addresses the problem of inefficient and inflexible retrieval systems for users needing dynamic document access, though it is incremental as it builds on existing generative retrieval concepts.

The paper tackles the high training costs and limited adaptability of existing generative retrieval methods by proposing a few-shot indexing framework that prompts an LLM to generate document identifiers without training, achieving superior performance to state-of-the-art methods.

Existing generative retrieval (GR) methods rely on training-based indexing, which fine-tunes a model to memorise associations between queries and the document identifiers (docids) of relevant documents. Training-based indexing suffers from high training costs, under-utilisation of pre-trained knowledge in large language models (LLMs), and limited adaptability to dynamic document corpora. To address the issues, we propose a few-shot indexing-based GR framework (Few-Shot GR). It has a few-shot indexing process without any training, where we prompt an LLM to generate docids for all documents in a corpus, ultimately creating a docid bank for the entire corpus. During retrieval, we feed a query to the same LLM and constrain it to generate a docid within the docid bank created during indexing, and then map the generated docid back to its corresponding document. Moreover, we devise few-shot indexing with one-to-many mapping to further enhance Few-Shot GR. Experiments show that Few-Shot GR achieves superior performance to state-of-the-art GR methods requiring heavy training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes