CL AISep 27, 2022

mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark

Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, Rodrigo Nogueira

arXiv:2209.13738v10.84 citationsh-index: 32Has Code

Originality Synthesis-oriented

AI Analysis

This work provides a new multilingual evaluation resource for the information retrieval community, but it is incremental as it adapts an existing benchmark.

The authors tackled the lack of a multilingual benchmark for information retrieval by creating mRobust04, a translated version of the TREC Robust 2004 dataset into 8 languages, and they provided initial results from three multilingual retrievers on this new dataset.

Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset. In this paper, we present mRobust04, a multilingual version of Robust04 that was translated to 8 languages using Google Translate. We also provide results of three different multilingual retrievers on this dataset. The dataset is available at https://huggingface.co/datasets/unicamp-dl/mrobust

View on arXiv PDF

Similar