IRCLOct 1, 2020

RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble

arXiv:2010.00200v117 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of information retrieval for biomedical data during a pandemic, but it is incremental as it builds on existing methods with an ensemble approach.

The paper tackled the problem of building a search engine for a rapidly evolving biomedical collection in the TREC-COVID challenge by proposing a weighted hierarchical rank fusion approach that ensembles 102 runs, achieving state-of-the-art performance in rounds 4 and 5.

In this paper, we report the results of our participation in the TREC-COVID challenge. To meet the challenge of building a search engine for rapidly evolving biomedical collection, we propose a simple yet effective weighted hierarchical rank fusion approach, that ensembles together 102 runs from (a) lexical and semantic retrieval systems, (b) pre-trained and fine-tuned BERT rankers, and (c) relevance feedback runs. Our ablation studies demonstrate the contributions of each of these systems to the overall ensemble. The submitted ensemble runs achieved state-of-the-art performance in rounds 4 and 5 of the TREC-COVID challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes