Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding
This work addresses efficiency and performance bottlenecks in generative retrieval for information retrieval systems, representing a significant incremental improvement over existing methods.
The paper tackles the problem of slow query latency in generative retrieval models by introducing PAG, a novel optimization and decoding approach that guides autoregressive generation of document identifiers through simultaneous decoding, resulting in a 15.6% MRR improvement on MS MARCO and a 22x speedup in query latency.
This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. The sequential identifier, on the other hand, is obtained via quantizing relevance-based representations of documents. Extensive experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin (e.g., 15.6% MRR improvements on MS MARCO), while achieving 22x speed up in terms of query latency.