IR AI CL LGAug 4, 2024

Generative Retrieval with Few-shot Indexing

Arian Askari, Chuan Meng, Mohammad Aliannejadi, Zhaochun Ren, Evangelos Kanoulas, Suzan Verberne

arXiv:2408.02152v38.17 citationsh-index: 18

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient and inflexible retrieval systems for users needing dynamic document access, though it is incremental as it builds on existing generative retrieval concepts.

The paper tackles the high training costs and limited adaptability of existing generative retrieval methods by proposing a few-shot indexing framework that prompts an LLM to generate document identifiers without training, achieving superior performance to state-of-the-art methods.

Existing generative retrieval (GR) methods rely on training-based indexing, which fine-tunes a model to memorise associations between queries and the document identifiers (docids) of relevant documents. Training-based indexing suffers from high training costs, under-utilisation of pre-trained knowledge in large language models (LLMs), and limited adaptability to dynamic document corpora. To address the issues, we propose a few-shot indexing-based GR framework (Few-Shot GR). It has a few-shot indexing process without any training, where we prompt an LLM to generate docids for all documents in a corpus, ultimately creating a docid bank for the entire corpus. During retrieval, we feed a query to the same LLM and constrain it to generate a docid within the docid bank created during indexing, and then map the generated docid back to its corresponding document. Moreover, we devise few-shot indexing with one-to-many mapping to further enhance Few-Shot GR. Experiments show that Few-Shot GR achieves superior performance to state-of-the-art GR methods requiring heavy training.

View on arXiv PDF

Similar