AIIRLGJan 8, 2021

Multistage BiCross encoder for multilingual access to COVID-19 health information

arXiv:2101.03013v34 citations
Originality Incremental advance
AI Analysis

This work provides a more accurate and efficient method for retrieving reliable COVID-19 health information for users across different languages, which is crucial during a global health crisis.

This paper addresses the challenge of multilingual semantic search for COVID-19 health information. The proposed Multistage BiCross encoder, a three-stage ranking pipeline, achieved state-of-the-art performance in both monolingual and bilingual search scenarios according to nearly all evaluation metrics in the MLIA shared task.

The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes