IRCLApr 22, 2024

Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

arXiv:2404.14600v136 citationsh-index: 10SIGIR
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance bottlenecks in generative retrieval for information retrieval systems, representing a significant incremental improvement over existing methods.

The paper tackles the problem of slow query latency in generative retrieval models by introducing PAG, a novel optimization and decoding approach that guides autoregressive generation of document identifiers through simultaneous decoding, resulting in a 15.6% MRR improvement on MS MARCO and a 22x speedup in query latency.

This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. The sequential identifier, on the other hand, is obtained via quantizing relevance-based representations of documents. Extensive experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin (e.g., 15.6% MRR improvements on MS MARCO), while achieving 22x speed up in terms of query latency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes