IRCLDec 3, 2021

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

arXiv:2112.01810v123 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for efficient relevance ranking in web search engines, though it is incremental as it adapts existing transformer methods to a specific domain.

The authors tackled the challenge of using computationally expensive BERT models for real-time web search ranking by deploying a siamese BERT-based architecture, which improved production performance by over 3% in a commercial search engine.

Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes