RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble
This work addresses the challenge of information retrieval for biomedical data during a pandemic, but it is incremental as it builds on existing methods with an ensemble approach.
The paper tackled the problem of building a search engine for a rapidly evolving biomedical collection in the TREC-COVID challenge by proposing a weighted hierarchical rank fusion approach that ensembles 102 runs, achieving state-of-the-art performance in rounds 4 and 5.
In this paper, we report the results of our participation in the TREC-COVID challenge. To meet the challenge of building a search engine for rapidly evolving biomedical collection, we propose a simple yet effective weighted hierarchical rank fusion approach, that ensembles together 102 runs from (a) lexical and semantic retrieval systems, (b) pre-trained and fine-tuned BERT rankers, and (c) relevance feedback runs. Our ablation studies demonstrate the contributions of each of these systems to the overall ensemble. The submitted ensemble runs achieved state-of-the-art performance in rounds 4 and 5 of the TREC-COVID challenge.