CLIRSep 23, 2022

Promptagator: Few-shot Dense Retrieval From 8 Examples

CMU
arXiv:2209.11755v1335 citationsh-index: 46
Originality Highly original
AI Analysis

This work addresses the problem of adapting retrieval systems to diverse, low-supervision tasks for researchers and practitioners in information retrieval, offering a novel approach that is not incremental but leverages LLMs for significant gains.

The paper tackles the challenge of few-shot dense retrieval by proposing Promptagator, which uses large language models to generate queries from just 8 examples, enabling task-specific retrievers without relying on large datasets like MS MARCO. The result is that dual encoders outperform heavily engineered models by over 1.2 nDCG on average across 11 retrieval sets, with further improvements of 5.0 nDCG when training re-rankers.

Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search intents, queries, and search domains. In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data. Powered by LLM's generalization ability, Promptagator makes it possible to create task-specific end-to-end retrievers solely based on a few examples {without} using Natural Questions or MS MARCO to train %question generators or dual encoders. Surprisingly, LLM prompting with no more than 8 examples allows dual encoders to outperform heavily engineered models trained on MS MARCO like ColBERT v2 by more than 1.2 nDCG on average on 11 retrieval sets. Further training standard-size re-rankers using the same generated data yields another 5.0 point nDCG improvement. Our studies determine that query generation can be far more effective than previously observed, especially when a small amount of task-specific knowledge is given.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes