IR CLOct 1, 2020

RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble

Michael Bendersky, Honglei Zhuang, Ji Ma, Shuguang Han, Keith Hall, Ryan McDonald

arXiv:2010.00200v111.119 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of information retrieval for biomedical data during a pandemic, but it is incremental as it builds on existing methods with an ensemble approach.

The paper tackled the problem of building a search engine for a rapidly evolving biomedical collection in the TREC-COVID challenge by proposing a weighted hierarchical rank fusion approach that ensembles 102 runs, achieving state-of-the-art performance in rounds 4 and 5.

In this paper, we report the results of our participation in the TREC-COVID challenge. To meet the challenge of building a search engine for rapidly evolving biomedical collection, we propose a simple yet effective weighted hierarchical rank fusion approach, that ensembles together 102 runs from (a) lexical and semantic retrieval systems, (b) pre-trained and fine-tuned BERT rankers, and (c) relevance feedback runs. Our ablation studies demonstrate the contributions of each of these systems to the overall ensemble. The submitted ensemble runs achieved state-of-the-art performance in rounds 4 and 5 of the TREC-COVID challenge.

View on arXiv PDF

Similar