CLIRJul 1, 2024

BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

arXiv:2407.01102v137 citationsh-index: 26Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of comparing RAG approaches for researchers and practitioners, though it is incremental as it builds on existing methods.

The authors tackled the problem of inconsistent benchmarking in Retrieval-Augmented Generation (RAG) by developing BERGEN, a library that standardizes experiments, and they benchmarked state-of-the-art components like retrievers, rerankers, and LLMs in QA tasks.

Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve an intricate number of different configurations such as evaluation datasets, collections, metrics, retrievers, and LLMs. Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline. In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments. In an extensive study focusing on QA, we benchmark different state-of-the-art retrievers, rerankers, and LLMs. Additionally, we analyze existing RAG metrics and datasets. Our open-source library BERGEN is available under \url{https://github.com/naver/bergen}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes