IRAICLApr 17, 2021

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

arXiv:2104.08663v41707 citationsHas Code
AI Analysis

This provides a standardized benchmark for researchers to evaluate and improve the out-of-distribution generalization of information retrieval models, though it is incremental as it builds on existing datasets and methods.

The authors tackled the lack of robust evaluation for neural information retrieval models by introducing BEIR, a heterogeneous benchmark of 18 datasets, and found that re-ranking and late-interaction models achieve the best zero-shot performance but with high computational costs, while dense and sparse models are more efficient but underperform.

Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes