IRCLOct 30, 2024

CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

arXiv:2410.23090v113 citationsh-index: 20
Originality Synthesis-oriented
AI Analysis

This addresses the gap in evaluating RAG systems for real-world multi-turn conversations, though it is incremental as it focuses on benchmarking rather than novel method development.

The authors tackled the lack of benchmarks for multi-turn conversational retrieval-augmented generation (RAG) by introducing CORAL, a large-scale benchmark derived from Wikipedia, which revealed significant opportunities for improvement in existing methods.

Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models (LLMs) through external knowledge retrieval. Despite its widespread attention, existing academic research predominantly focuses on single-turn RAG, leaving a significant gap in addressing the complexities of multi-turn conversations found in real-world applications. To bridge this gap, we introduce CORAL, a large-scale benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia and tackles key challenges such as open-domain coverage, knowledge intensity, free-form responses, and topic shifts. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling. We propose a unified framework to standardize various conversational RAG methods and conduct a comprehensive evaluation of these methods on CORAL, demonstrating substantial opportunities for improving existing approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes