From RAG to RICHES: Retrieval Interlaced with Sequence Generation
This addresses the need for more efficient and adaptable retrieval-augmented generation systems for natural language processing tasks, though it appears incremental as it builds on existing instruction-tuned models without new training.
The paper tackles the problem of separate retriever and generator components in conventional RAG systems by introducing RICHES, which interleaves retrieval with sequence generation in a single decoding pass, demonstrating strong performance on ODQA tasks such as attributed and multi-hop QA.
We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks. RICHES offers an alternative to conventional RAG systems by eliminating the need for separate retriever and generator. It retrieves documents by directly decoding their contents, constrained on the corpus. Unifying retrieval with generation allows us to adapt to diverse new tasks via prompting alone. RICHES can work with any Instruction-tuned model, without additional training. It provides attributed evidence, supports multi-hop retrievals and interleaves thoughts to plan on what to retrieve next, all within a single decoding pass of the LLM. We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.