CLAIIRLGMay 26, 2023

Multiview Identifiers Enhanced Generative Retrieval

arXiv:2305.16675v1239 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating distinctive identifiers for passages in retrieval systems, which is incremental as it builds on existing methods by enhancing identifier types.

The paper tackled the problem of generative retrieval by proposing synthetic identifiers that integrate contextualized information from passage content, alongside multiview identifiers including titles and substrings, to improve representation and ranking. Results on three public datasets showed that this approach achieved the best performance in generative retrieval, demonstrating effectiveness and robustness.

Instead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes