Cross-Lingual Relevance Transfer for Document Retrieval
This addresses the challenge of building effective retrieval systems for multiple languages with limited data, though it is incremental as it combines existing techniques.
The paper tackled the problem of transferring relevance models across languages for document retrieval, showing that multilingual BERT trained on English data improves ranking quality in five diverse languages without special processing.
Recent work has shown the surprising ability of multi-lingual BERT to serve as a zero-shot cross-lingual transfer model for a number of language processing tasks. We combine this finding with a similarly-recently proposal on sentence-level relevance modeling for document retrieval to demonstrate the ability of multi-lingual BERT to transfer models of relevance across languages. Experiments on test collections in five different languages from diverse language families (Chinese, Arabic, French, Hindi, and Bengali) show that models trained with English data improve ranking quality, without any special processing, both for (non-English) mono-lingual retrieval as well as cross-lingual retrieval.