IRCLNov 6, 2023

GLEN: Generative Retrieval via Lexical Index Learning

arXiv:2311.03057v1141 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency and accuracy issues in document retrieval for AI and information retrieval systems, representing an incremental improvement over existing generative retrieval methods.

The paper tackles the challenges in generative retrieval, such as the discrepancy between pre-trained language models and identifiers and the training-inference gap, by proposing GLEN, which uses dynamic lexical identifiers and collision-free inference, achieving state-of-the-art or competitive performance on datasets like NQ320k, MS MARCO, and BEIR.

Generative retrieval shed light on a new paradigm of document retrieval, aiming to directly generate the identifier of a relevant document for a query. While it takes advantage of bypassing the construction of auxiliary index structures, existing studies face two significant challenges: (i) the discrepancy between the knowledge of pre-trained language models and identifiers and (ii) the gap between training and inference that poses difficulty in learning to rank. To overcome these challenges, we propose a novel generative retrieval method, namely Generative retrieval via LExical iNdex learning (GLEN). For training, GLEN effectively exploits a dynamic lexical identifier using a two-phase index learning strategy, enabling it to learn meaningful lexical identifiers and relevance signals between queries and documents. For inference, GLEN utilizes collision-free inference, using identifier weights to rank documents without additional overhead. Experimental results prove that GLEN achieves state-of-the-art or competitive performance against existing generative retrieval methods on various benchmark datasets, e.g., NQ320k, MS MARCO, and BEIR. The code is available at https://github.com/skleee/GLEN.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes