IRCLApr 3, 2023

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

arXiv:2304.01019v16 citationsh-index: 87
Originality Synthesis-oriented
AI Analysis

This work addresses reproducibility and clarity issues for researchers in cross-lingual information retrieval, though it is incremental as it builds on existing multilingual models and toolkits.

The authors tackled the problem of confusing and non-reproducible methods in cross-lingual information retrieval by providing a conceptual framework and implementing simple, reproducible baselines in IR toolkits for TREC 2022 NeuCLIR Track collections in Persian, Russian, and Chinese, resulting in effective runs that set a foundation for future work.

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another. However, the rapid pace of progress has led to a confusing panoply of methods and reproducibility has lagged behind the state of the art. In this context, our work makes two important contributions: First, we provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold. Second, we implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese. Our efforts are built on a collaboration of the two teams that submitted the most effective runs to the TREC evaluation. These contributions provide a firm foundation for future advances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes