CLFeb 27, 2024

Retrieval is Accurate Generation

arXiv:2402.17532v313 citationsh-index: 10ICLR
Originality Highly original
AI Analysis

This work addresses the challenge of knowledge-intensive and open-ended text generation for AI applications, representing a new paradigm shift rather than an incremental improvement.

The paper tackles the problem of standard language models generating text from a fixed vocabulary by introducing a method that selects context-aware phrases from supporting documents, resulting in improved accuracy from 23.47% to 36.27% on OpenbookQA and MAUVE score from 42.61% to 81.58% in open-ended text generation.

Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retrieved from numerous possible documents. To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement. Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. For instance, compared to the standard language model counterpart, our model raises the accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from 42.61% to 81.58% in open-ended text generation. Remarkably, our model also achieves the best performance and the lowest latency among several retrieval-augmented baselines. In conclusion, we assert that retrieval is more accurate generation and hope that our work will encourage further research on this new paradigm shift.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes